The human diet includes plant cell wall (PCW) polysaccharides that serve as fermentable nutrients for the complex microbial community found in the lower gastrointestinal tract. Along with cellulose and hemicellulose, pectin is a major polysaccharide constituting the PCW. Thick layers of pectin also cover the surface of the PCW, forming middle lamellae between the shared cell wall interfaces [1]. Pectin is the most complex polysaccharide found in PCW, consisting of structurally heterogeneous components, such as homogalacturonan (HG), rhamnogalacturonan-I (RG-I), and rhamnogalacturonan-II (RG-II) [2]. HG is a homogenous polymer of α-1,4-linked-D-galacturonic acid (D-GalpA) which constitutes the majority of uronic acid contents of pectin, and 65–70% of the total pectin mass [2, 3]. Approximately half of D-GalpA residues present in HG are either methyl-esterified at C-6 or acetyl-esterified at O-2 and/or O-3 [3]. Non-esterified D-GalpA residues carry a negative charge, enabling the formation of a gel-like texture by chelating Ca2+ ions [4]. The RG-I and RG-II regions of pectin are compositionally heterogeneous, containing diverse neutral sugars. Depending on the plant species, up to 20–80% of L-rhamnose (L-Rhap) residues in RG-I are branched by arabinan (polymers of α-L-1,5-arabinofuranose (L-Araf) units branched at O-2 and O-3 with α-L-Araf residues), galactan (unbranched polymers of β-D-1,4-galactopyranose (D-Galp) residues), and arabinogalactan (a linear β-1,4-galactan substituted with α-L-1,5-Araf oligosaccharides) [2]. Some arabinan and galactan are substituted with ferulic acid side chains, which can dimerize to strengthen the pectin network [5]. The backbone of RG-I consists of alternating diglycosyl units of α-D-GalpA and α-L-Rhap [→2)-α-L-Rhap-(1→4)-α-D-GalpA-(1→] [6]; whereas, the backbone of RG-II is made of a linear α-1,4-L-GalpA residues, and does not contain L-Rhap units as a part of the basal structure [7]. The RG-II is less abundant than RG-I, but shows a higher degree of structural complexity as it contains at least 13 glycosyl residues covalently linked together by more than 21 different types of glycosidic linkages [8, 9]. In the primary cell wall, RG-II predominantly occurs as a dimer crosslinked by a borate diester [7].

The structural complexity of pectin poses a considerable challenge to the human digestive system, and humans must rely on the concerted actions of CAZymes produced by symbiotic gut bacteria for pectin degradation. Related research with environmental and animal gut bacteria suggest that the erosion of middle lamellae by pectin-degrading bacteria is a necessary prerequisite for initiating PCW degradation, exposing other cell wall polymers and allowing the establishment of a more heterogeneous bacterial colonization along the PCW [10, 11]. So far, such association has been difficult to examine in humans partly due to the technical difficulty of cultivating anaerobic gut bacteria. Our current knowledge is best exemplified by Bacteroides spp., Gram-negative generalist carbohydrate degraders well-studied for their versatile glycan-foraging strategy using gene clusters called polysaccharide utilisation loci (PULs) [8, 12,13,14]. Upon detecting a signalling carbohydrate, a PUL is activated to express surface glycan-binding proteins, outer membrane oligosaccharide transporters, surface/periplasmic CAZymes, and SusC and SusD homologues [15]. This synchronous production of components comprising a complete glycan-foraging unit underpins the sequestration model of Bacteroides which efficiently binds, degrades, sequesters, and transports PDPs into the intracellular space while reducing losses to other members of the HGM [12, 14, 16], although cases of glycan sharing were also reported [17, 18]. Although the important role of Gram-positive Firmicutes as primary degraders that break down dietary glycans to release products to the HGM in a cross-feeding relationship has been proposed [12], the mechanisms by which Firmicutes degrade pectin in the human gut are not well understood. Currently known pectin-degrading Firmicutes species are few in number, namely Eubacterium eligens [19, 20] and Faecalibacterium prausnitzii [21] that possess relatively small repertoires of CAZyme-encoding genes involved in pectin degradation. Previously, we reported the isolation of a novel Firmicutes bacterium from human faeces (M. pectinilyticus 14 T), whose selective growth on pectin was mediated through a tight cell-substrate interaction [22] which is considered as a hallmark trait of primary PCW-degrading bacteria [23]. This raised the possibility that within the human gut there is an insufficiently characterized ecological niche for pectin-degrading specialists, such as M. pectinilyticus, which initiate the cascade of PCW degradation by dissolving the obstructive pectin layers to expose attachment sites for heterogeneous bacterial species, and release oligosaccharides for utilization by secondary feeders of the microbial community. Building on this earlier discovery and recognizing the scarcity of bacterial model systems for studying the colonic pectin degradation, we sought to provide a first insight into the primary PCW degradation by examining the pectin degradation by M. pectinilyticus through combined proteogenomic and biochemical approaches. We generated a high-quality genome of M. pectinilyticus which revealed genetic, metabolic, and glycobiomic specializations for pectin degradation and utilization, as well as genes encoding unique protein features indicative of an extracellular glycan degradation strategy previously unseen in the HGM. To examine the host-bacterial relationships, we assessed the prevalence and abundance of M. pectinilyticus among healthy human subjects, and revealed a novel phylogenetic lineage of Ruminococcaceae currently represented by M. pectinilyticus and its uncultured relatives from gut systems of various terrestrial organisms.


Taxonomic affiliation of M. pectinilyticus and its presence in human studies

We first sought to determine whether M. pectinilyticus or any related uncultured bacteria are true human gut commensals. An unfiltered-BLAST search of the GenBank database identified 77 near full-length 16S rRNA gene sequences (59 from human faeces and 18 from other human/animal sources) with ≥92% sequence identities (query cover ≥80%) to M. pectinilyticus (Supplementary Information S1). These sequences phylogenetically diverged from Clostridium clusters III and IV with a high bootstrap support, forming a novel polyphylectic clade within the family Ruminococcaceae (Fig. 1). The sequences from this clade formed five clusters (arbitrarily named MP-I to MP-V) in the phylogenetic tree, of which M. pectinilyticus and its uncultured relatives from human faeces formed a dominant cluster (MP-I). To further establish M. pectinilyticus as a human gut commensal, we performed quantitative PCR using faecal DNA samples collected from 44 healthy New Zealand volunteers, and detected M. pectinilyticus from 10 donors at a mean relative abundance of ~0.3% (excluding an outlying donor 24) (Supplementary Information S2). To relate the presence of M. pectinilyticus to dietary consumption in donors, subjects were asked to provide four sets of 3-day diet records which were used to calculate the average daily consumption of different food groups (Supplementary Information S3). Among the 10 individuals that tested positive for M. pectinilyticus, 8 reported to consume more than the recommended amount of dietary fibre (>25 g/day for females and >30 g/day for males), and 5 individuals consumed >5 g of pectin per day. Overall, members of the M. pectinilyticus-positive group showed higher median values for fibre and pectin intakes compared to those in the negative group (Supplementary Fig. S1a, b). We further calculated the daily servings of vegetable, fruit, grain, and protein for each donor, but statistically significant correlation to M. pectinilyticus was not observed under these categories (Supplementary Fig. S1c-f).

Fig. 1
figure 1

Phylogenetic analysis reveals M. pectinilyticus is a human gut commensal from a previously unexplored taxonomic branch of Ruminococcaceae. This novel branch of Ruminococcaceae consists of polyphylectic sequence clusters (MP-I to MP-V) in which M. pectinilyticus and closely related uncultured bacterial 16S rRNA gene sequences are phylogenetically placed in the MP-I cluster. The identity of 16S rRNA gene sequences from each MP cluster is given in Supplementary Information S1. Bootstrap values were calculated using 2000 re-samplings

Genome organization and general features

The draft genome of M. pectinilyticus (GenBank accession CP020991) was sequenced and assembled using the Illumina HiSeq 2500 system and SPAdes genome assembler [24]. The final genome assembly consists of a single contig of 2,757,678 bp, only lacking closure from a ~2 kb sequencing gap within a predicted prophage region. Using a combination of Prokka [25] and other protein function prediction tools, a total of 2263 protein-coding sequences (CDS) were annotated, and HMMER-based pipelines were used to assign Clusters of Orthologous Groups (COG) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway genes (Supplementary Fig. S2). The annotated genome features a complete set of amino acid biosynthesis pathways, along with 46 transfer RNAs for all 20 natural L-amino acids. The genome lacks biosynthesis pathways for biotin, riboflavin, ascorbate, and folate, whereas complete pathways for nicotinate, vitamin B5, vitamin B6, and thiamine are present. No genes involved in flagella assembly or exopolysaccharide biosynthesis are present, in accordance with our previous observations [22]. Genetic elements of four complete or partially degraded prophage genomes are present, as well as two putative integrated, horizontally transferred plasmids (~51 kb and ~73 kb), and CRISPR type I-C and II-A systems.

Glycobiome analysis

We identified glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), non-catabolic glycosyl transferases (GTs), and carbohydrate-binding modules (CBMs) using the CAZy database, finding 91 genes annotated to encode 108 putative CAZyme domains (Supplementary Information S4). The genome encodes 48 putative pectin-degrading CAZyme domains including 20 pectate/pectin lyases and rhamnogalacturonan lyases (PL1 and PL9), one rhamnogalacturonan lyase (PL11), eight pectin methylesterases (CE8), four pectin acetyl esterases (CE12), one polygalacturonase (GH28), one 2-keto-3-deoxynononic acid hydrolase/sialidase (GH33), three β-D-xylosidase/α-L-arabinofuranosidases (GH43), one α-L-arabinofuranosidase/β-D-xylosidase/β-1,4-D-xylanase/endoglucanase (GH51), one α-D-galactosidase/α-D-glucosidase (GH97), one unsaturated homogalacturonyl/rhamnogalacturonyl hydrolase (GH105), one apiosidase (GH140), and one multi-domain CE12/CE8. In addition, a novel CAZyme combination consisting of GH95/CE8 is encoded by B9O19_1299 and B9O19_1681, whose exact role in the pectin degradation is currently unclear. M. pectinilyticus also produces degradative CAZymes related to other PCW polymers, including five poly-specific GH3 family enzymes, three α-L-fucosidases/α-L-galactosidase (GH95), one β-D-glucosidase/β-D-xylosidase (GH116), and one poly-specific GH5 family enzyme. We acknowledge that predicting CAZyme functions and specificities based on informatics analysis has limitations as these enzyme families are often poly-specific, and lack functional characterization [26]. Considering the moderate glycobiome size of M. pectinilyticus compared to other pectinolytic bacteria available in the CAZy database, this bacterium possesses disproportionally large numbers of genes for CEs and PLs predicted to be involved in the initiation of pectin degradation (Fig. 2a). While pectate lyases are abundantly produced by fungal or bacterial plant pathogens (e.g. Dickeya dadantii), it is unusual amongst gut bacteria to derive a larger share of pectinolytic activity from PLs than GHs (Fig. 2b). Abundant production of CEs presumably provides a competitive advantage by a hierarchical removal of methyl- and acetyl-groups to facilitate rapid access by PLs, which in turn cleave HG and RG backbones to generate unsaturated oligomeric end-products of β-elimination [27]. Consistent with this, all PLs and most CE8 and CE12 contain putative signal peptide sequences, suggesting pectin degradation mostly occurs in the extracellular environment. We confirmed this by observing the gradual degradation of citrus and apple pectins in the culture supernatant of M. pectinilyticus using size exclusion chromatography (SEC) (Supplementary Fig. S3). The SEC analysis results showed that M. pectinilyticus carries out an extracellular degradation of high molecular weight pectic carbohydrates while simultaneously releasing degraded oligomers into the culture supernatant. Most pectin-related GH family enzymes, including GH28 and GH105, lack signal peptide sequences, indicating their roles in the intracellular processing of oligomers. Based on the informatics data, we initially predicted the primary target for degradation to be the pectin backbone rather than the side chains, as M. pectinilyticus lacks identifiable β-galactosidases/galactanases and terminal rhamnosidases required for the degradation of RG-I. Furthermore, the comprehensive RG-II degradome recently described from B. thetaiotaomicron [8] is only partially present in M. pectinilyticus (GH33, GH43, and GH140). To test our genome-based predictions, the oligosaccharide and monosaccharide degradation remnants in the 0 h, 48 h, and 72 h culture supernatants of M. pectinilyticus grown on arabinan (sugar beet), arabinogalactan (larch wood), RG-I (potato), and galactan (potato) were examined using high-performance anion-exchange chromatography (HPAEC) with pulsed amperometric detection (PAD). Unexpectedly, M. pectinilyticus degraded RG-I and galactan to generate smaller weight degradation products, whereas arabinan and arabinogalactan did not appear to be degraded (Supplementary Fig. S4 a-d). While M. pectinilyticus grows on RG-I (data not shown), no growth was observed on galactan, arabinan, and arabinogalactan over three successive transfers (data not shown), consistent with our previous data [22]. The degradation profiles from RG-I show the accumulation of oligosaccharides at 48 and 72 h (Supplementary Fig. S4 e), but not of L-Rhap monomers (Supplementary Fig. S4 g). This was consistent with our genome-based prediction that M. pectinilyticus does not produce rhamnosidases which enable the cleavage of terminal L-Rhap. Although M. pectinilyticus degrades potato galactan, the resulting D-Galp monomers accumulate in the culture supernatant without being utilized (Supplementary Fig. S4 f, h), consistent with our previous observation [22]. We attempted to predict carbohydrate recognition sites in silico, but few CBM families could be identified from the CAZymes of M. pectinilyticus, perhaps due to the small number of pectin-binding CBMs in the current CAZy database [28].

Fig. 2
figure 2

The glycobiome of M. pectinilyticus shows an unusual PL- and CE-enriched profile of extracellular CAZymes. a Comparison of PLs, GHs and CEs with pectin-specific activities between pectin-degrading strains isolated from the human gut (M. pectinilyticus, E. eligens, F. prausnitzii, Bacteroides thetaiotaomicron, and Bacteroides ovatus), the animal rumen (Prevotella ruminicola and Butyrivibrio proteoclasticus), and plants infested with soft rot disease (D. dadantii). b Comparison of proportion of pectin-specific CAZymes relative to the total number of degradative CAZymes (PLs, GHs, and CEs) for each strain

PL1 and CE8 CAZyme phylogeny

The unusual PL1 and CE8-dominated CAZyme profile of M. pectinilyticus prompted us to attempt to infer the evolutionary history of these enzymes. We used a BlastP search of the NCBI protein database to find the closest sequence relatives to PL1 and CE8 of M. pectinilyticus. We then extracted catalytic domains from 193 PL1 and 85 CE8 sequences to construct phylogenetic trees using the maximum-likelihood method. The majority of PL1 and CE8 domains of M. pectinilyticus forms species-specific clusters that are separate from other clusters consisting of NCBI sequence relatives (Fig. 3 with extended phylogenetic trees in Supplementary Information S5), suggesting that these enzymes have independently evolved, and may fulfil functions specific to M. pectinilyticus. All CE8 domain sequences of M. pectinilyticus form discrete species-specific clusters, with B9O19_680 and B9O19_869 distantly related to CE8 of Clostridium thermocellum, Acetivibrio cellulolyticus, and Paenibacillus mucilaginosus. A group of six PL1 (B9O19_312, B9O19_929, B9O19_1443, B9O19_1867, B9O19_2006, and B9O19_2160), and another group of two PL1 (B9O19_1239 and B9O19_1241) show significant protein sequence homology (>200 bit-score; e-value <e−50; and query cover >50%) only within the members of the groups, suggesting that these subsets of PL1s may have arisen from gene duplication events. A ~207 kDa protein (B9O19_909) containing PL1, fibronectin type III (Fn3), and SLH repeat domains has no close orthologues in NCBI database, and its PL1 catalytic domain showed <40% peptide sequence similarities (<100 bit-score; and e-value >e−30) to other PL1 domain sequences used for comparison in this study. B9O19_874 and B9O19_1315 are distantly related to CDC19498 of an uncultured Eubacterium species. B9O19_51 is the only PL1 of M. pectinilyticus that contain identifiable CBM domains (CBM13-PL1-CBM13) and shows 72% protein sequence homology with a CBM13-containing PL1 of Clostridium bornimense (CDM70399). B9O19_2062 contains extensions of unknown function at both ends of its catalytic PL1 domain (UNK-PL1-UNK) and is related to OLA16957 of an uncultured Eubacterium sp. B9O19_873 and B9O19_1377 are distantly related PL1 clusters consisting of heterogeneous groups of bacteria. PL1s encoded by B9O19_1128, B9O19_1129, and B9O19_1442 are distantly related to PL1s bearing putative dockerin domains that are often associated with the assembly of clostridial and ruminococcal cellulosomes. However, no dockerin sequences have been identified from any of the CAZymes in M. pectinilyticus, indicating that the few dockerin- and cohesin-like domains present in hypothetical proteins of M. pectinilyticus likely occurred outside of a cellulosomal context. Although non-cellulosomal dockerin/cohesin modules are found in 14% of the known bacterial genomes, their functions remain obscure as most of these sequences occurs integral to hypothetical proteins with unknown functions [29]. All dockerin- and cohesin-containing proteins found in the M. pectinilyticus genome contain secretory signal peptide sequences and/or SLH domains, suggesting these proteins may function at the bacterial cell surface. Further examination of these proteins may reveal potentially novel catalytic and/or carbohydrate-binding functions in that dockerin and/or cohesin modules were previously found appended to CAZyme domains in a single protein among few members of the HGM [29].

Fig. 3
figure 3

PL1 and CE8 domain sequences of M. pectinilyticus form discrete species-specific clusters. The PL1 a and CE8 b catalytic domains were extracted from enzyme sequences, aligned, and used to construct a maximum-likelihood phylogenetic trees. Reference sequences were chosen based on the highest BlastP identity matches to M. pectinilyticus PL1 and CE8 sequences. GenBank accession numbers and bootstrap values from 1000 re-samplings are indicated. Bacterial source of sequences are indicated with colours. Extended phylogenetic trees are given in Supplementary Information S5

Modular organization of SLH domain-containing proteins

The M. pectinilyticus genome encodes 42 extracellular ‘SLH proteins’ predicted to attach to the extracellular surface via one to three SLH modules (Fig. 4). To our knowledge, this is one of the largest number of SLH proteins reported from any obligate anaerobic bacterium (Supplementary Information S6). Among the SLH proteins of M. pectinilyticus, eight are associated with pectin-degrading CAZymes, including PL1, CE8, CE12, and GH97; five contain peptidases; two contain N-acetylmuramoyl-L-alanine amidase; four contain domains with poorly characterized functions; and 23 have few identifiable domains other than SLH modules (Mw of 42–318 kDa). All but one of the SLH proteins possess an N-terminal type I signal peptide; whereas, the sole exception (B9O19_1870) contains a lipoprotein type II signal peptide. CAZymes attached to the bacterial surface via SLH modules have been previously implicated in the attachment and deconstruction of lignocellulose biomass in numerous Firmicutes organisms isolated from environmental samples [30,31,32,33,34,35,36,37,38,39,40], although to our knowledge, this mechanism is very rarely observed in the human gut, and M. pectinilyticus is the first isolate to use it in the strict context of pectin degradation. While some of the M. pectinilyticus SLH proteins have similar domain architectures to the structural protein components of the S-layer in Bacillus anthracis [41], SLH proteins containing CAZyme domains, cohesin- and dockerin-like domains, Fn3, and bacterial immunoglobulin (Big)-like domains are presumably involved in carbohydrate degradation [32, 42,43,44]. Using BlastP, at least 2 SLH protein sequences of M. pectinilyticus have been found in the NIH human stool metagenome databases generated from 12 subjects (out of 85) living in the US (~100% continuous sequence matches along >100 amino acids; e-value <1e−5), affirming that these SLH proteins are abundant within the HGM (Supplementary Information S7).

Fig. 4
figure 4

The 42 SLH module-containing proteins of M. pectinilyticus contain few identifiable domains, including pectin-degrading CAZymes, putative dockerins and cohesins, and FN3- and Big-like domains with presumed functions in carbohydrate-binding and degradation. Protein lengths are drawn to scale. Amino acid positions are indicated with numbers. Predicted protein domains are shown as rectangles, and coloured according to the domain category shown in the key. Where possible, additional domain details are labelled within domain rectangles. Remaining regions, with no known homology to existing protein domains, are represented by lines

Pectin metabolism and organization of pectin-gene clusters

We mapped the M. pectinilyticus genes to the KEGG metabolic pathways. The presence of the KEGG metabolic pathways for utilizing D-GalpA, 5-keto-4-deoxyuronate (DKI), D-glucuronic acid, D-Xylp, and L-Araf supports that monomeric derivatives of pectin and hemicellulose are the primary carbon sources used by M. pectinilyticus (Fig. 5). Fructose is the only non-PCW sugar which M. pectinilyticus utilizes, and for which the strain possesses its only phosphotransferase system (PTS) for the uptake across the cell membrane. Next we looked at the location of some of these genes within the M. pectinilyticus genome. While others are present in the genome as isolated loci, some genes encoding putative pectin-degrading CAZymes are located adjacent to the genes coding for SLH proteins (Fig. 6a, c, d) or as part of the two notable ‘pectin clusters’ consisting of distinct genomic regions of pectin-utilizing functions (Fig. 6b, e). The two most notable pectin clusters were observed between B9O19_864 and B9O19_877, and between B9O19_2005 and B9O19_2015. The former spans a ~22 kb genomic segment and contains genes coding for CAZymes (GH28, GH43, PL1, and CE8); GalpA-metabolising enzymes (uxaA, uxaB, and uxaC); KDG kinase (kdgK); a sugar-binding ABC type transporter unit; and an AraC type transcription regulator. Another cluster of genes between B9O19_2005 and B9O19_2015 forms a ~21 kb genomic segment encoding CAZymes (PL1, GH33, GH51, GH95, and CE8); 2-dehydro-3-deoxy-D-gluconate 5-dehydrogenase (kduD); a phosphoketolase (xfp); a GntR type of transcription regulator; a biotin transporter; and proteins of ambiguous functions. Phosphoketolases are thiamine diphosphate-dependent enzymes responsible for two phosphoroclastic reactions: (1) conversion of xylulose-5-P (X5P) into glyceraldehyde-3-P (G3P) and acetyl-P, and (2) conversion of fructose-6-phosphate (F6P) to erythrose-4-P and acetyl-P. The position of the xfp gene within a pectin cluster is intriguing since X5P is a metabolic derivative of xylose and arabinose – the latter a predominant pentose sugar of pectin that is utilized by M. pectinilyticus. Notably present in Bifidobacteria [45, 46], Lactobacillales [47, 48], and Clostridium acetobutylicum [49, 50], mono- or bifunctional phosphoketolases have been found to shunt X5P and/or F6P through a catabolic alternative of the pentose-phosphate pathway (PPP) by bypassing carbon flux through the lower portion of glycolysis, thus avoiding a complete oxidation of carbon to CO2. As a result, the phophoketolase pathway (PKP) yields fewer ATP than PPP [49], but also has lesser needs for NAD+ regeneration as it adopts an enzymatic ‘shortcut’ through the conversion of X5P to acetate [49, 51]. Consistent with the traditional view of the pentose sugar metabolism in Clostridia [51], M. pectinilyticus also possesses enzymes of the PPP, including transketolase, transaldolase, and epimerase. Theoretically, more than one route of pentose sugar metabolism may exist in M. pectinilyticus, allowing greater metabolic flexibility. The existence of pectin clusters suggests that gene responses may be regulated at a transcriptional level to synchronize the catabolic and metabolic processes of pectin degradation. Further studies are necessary to conclude if M. pectinilyticus uses general regulatory mechanisms such as catabolite repression, the two-component systems, or alternative σ-factors [52] we know to exist within the genome (data not shown) to sense and react to the presence pectic polysaccharides in its extracellular surroundings. Using the ABC transporter database [53] we identified three ABC-related transporters (B9O19_98–B9O19_100, B9O19_870–B9O19_872, and B9O19_1089) presumably involved in carbohydrate-binding and transport. An additional non-ABC type extracellular transporter (B9O19_1628) is present adjacent to genes coding for CE8 domain-containing SLH proteins, possibly suggesting its role in transporting pectic PDPs across the cell wall.

Fig. 5
figure 5

Reconstruction of sugar metabolism by M. pectinilyticus. Colour shadings distinguish major metabolic pathways. Arrows indicate the direction of enzymatic reactions. Locus tag numbers are given. KEGG numbers and enzyme identities of each gene are given in Supplementary Information S8. Question mark indicates missing genes (scs and sdh) in an otherwise complete TCA cycle

Fig. 6
figure 6

Organization of gene clusters related to pectin degradation/utilization functions. Gene orientation is indicated by arrows. Locus tag numbers (B9O19) are indicated. Colours corresponding to the predicted gene product function categories are indicated in the key. Genes corresponding to the proteome products that are upregulated in response to pectins (Fig. 7) are circled. a, c, d Genes encoding pectin-degrading CAZymes that are located adjacent to the genes coding for SLH proteins. b, e ‘Pectin clusters’. CAO, copper amine oxidase-like domain-containing protein

Comparative proteomic profiling

To examine the possible role of SLH proteins in pectin degradation, we used isobaric tags for relative and absolute quantification (iTRAQ) technology to preliminarily measure the differential protein production in M. pectinilyticus in response to pectins from apple and kiwifruit, compared to the cells grown using the simple sugar fructose. Using three biological replicates, we identified 159 proteins that were commonly detected during the growth of M. pectinilyticus on both types of pectins (Fig. 7, extended data in Supplementary Information S9). Several non-catalytic (B9O19_52, B9O19_611, and B9O19_1156) and CE8 domain-containing (B9O19_1627) SLH proteins, elongation factor G (B9O19_1947), and peptidase C1A (B9O19_1475) were upregulated in the presence of either types of pectin. Proteins that were additionally upregulated upon growth on apple pectin include a non-catalytic SLH protein (B9O19_1597), a CE8 domain-containing SLH protein (B9O19_826), ABC-related sugar transporters (B9O19_870 and B9O19_1089), a bifunctional KDPG/KHG aldolase (B9O19_1425), transaldolase (B9O10_1680), and a dockerin-like domain-containing protein (B9O19_1577). Proteins upregulated only upon growth on kiwifruit pectin include a heat shock protein (B9O19_602), an extracellular protein of unknown function (B9O19_682), and an XRE family transcription regulator (B9O19_1520). Several SLH proteins, notably B9O19_1156 and B9O19_1597, were represented by the highest numbers of peptides in all samples. Not all SLH proteins were upregulated in response to pectin (e.g. B9O19_826, B9O19_908, B9O19_909, B9O19_912, B9O19_914, B9O19_1225, B9O19_1337, B9O19_1578, B9O19_1595, and B9O19_1626), and the pattern of expression was not always consistent with both types of pectins. While the exact functions of SLH proteins are currently not understood, it is possible that M. pectinilyticus has varying anchoring requirements for SLH proteins to facilitate the binding and degradation of different pectin structures. Proteins involved in fructose uptake (B9O19_1606 - B9O19_1608), glycolysis (B9O19_1010, B9O19_1941, and B9O19_1942), the PKP (B9O19_2009), TCA cycle (B9O19_1097, B9O19_1902, and B9O19_1903) and acetate/formate production (B9O19_89, B9O19_90, and B9O19_137) were mostly downregulated in pectin-grown cells. Despite the abundant PL sequences identified in the M. pectinilyticus genome, our proteome data did not show upregulation of PLs in response to pectins. It is possible that either these PLs are unrelated to pectin degradation, or CAZymes that are non-covalently attached to the cell wall or lack the cell wall-retention domains such as SLH modules became lost from the whole-cell proteome. The SEC analysis results showed that the degradation of high molecular weight pectic materials occurs at an early stage of growth (48 h), suggesting PLs may be highly expressed during the log/exponential phase. However, due to the difficulties of harvesting a sufficient mass of slowly growing cells (~0.3 OD595 reached after 1-week growth), only cultures that had reached stationary phase were used in this study.

Fig. 7
figure 7

iTRAQ protein quantification ratios between pectin- and fructose-grown cells of M. pectinilyticus. Peptide ratios for apple a and kiwifruit b pectins were determined using fructose values as the baseline denominators. Proteins are considered upregulated (red); downregulated (blue); or not significantly changed (white). Numbers indicate locus tags (B9O19). Extended data are given in Supplementary Information S9


Until recently, most research on colonic pectin degradation was carried out using non-specialist bacteria. To our knowledge, M. pectinilyticus is the first human gut bacterium to show the functional specialization for pectin degradation and utilization. Notably, it is a member of the Ruminococcaceae as are other specialist human gut bacteria capable of initiating the degradation of resistant polysaccharides [54]. M. pectinilyticus takes a ‘quality-not-quantity’ approach by directing a significant proportion of its fibrolytic potential (48 CAZymes out of 108) to focus on pectin degradation. The M. pectinilyticus glycobiome is enriched with pectate/pectin lyases and pectin esterases, contrasting with the GH-rich CAZyme profiles more conventionally observed in human gut bacteria. Pectin lyases are capable of splitting pectin polymers independent of the degree of methylation. However, most microbial pectin lyases have been characterized from fungi [27, 55], with rare exceptions found in endophytic environmental bacteria such as Bacillus/Paenibacillus spp., Pseudomonas spp., and Erwinia spp [56]. The striking over-representation of PLs in the glycobiome of M. pectinilyticus, and their phylogenetic similarity to enzymes from environmental bacteria suggest that M. pectinilyticus should be further examined for potential pectin lyase activities. Pectate lyases, on the other hand, are commonly produced by human gut bacteria for the enteric digestion of pectin [27]. PL family enzymes make up less than 2% of the total glycobiome in the HGM [57], but pectate lyases play key roles in mediating the extracellular degradation of complex pectin in the human colon. In previous studies conducted elsewhere [55, 58], the microbial degradation products of pectin were examined from the human faecal fluid, the colonic contents from rats, and the culture supernatant of B. thetaiotaomicron. In all cases, the resulting D-GalpA oligosaccharides followed the β-elimination pattern of pectate lyases by retaining the unsaturated double bonds at the non-reducing ends of molecules. Extensive methyl- and acetyl-esterification on the pectin backbone is a sterically limiting factor for pectate lyase activities. Pre-processing pectin with methyl- and acetyl-esterases significantly improves substrate accessibility by pectate lyases, and accelerates the enzyme reaction rates [59, 60]. Complementary actions of pectate lyases and pectin esterases would provide M. pectinilyticus the enzymatic flexibility to deal with the varying degrees of esterification observed in different types of the PCW pectins. Partial trimming of RG-I side chains is expected to occur by an unknown extracellular β-D-1,4-galactanase/galactosidase. A relatively small number of GH family enzymes may carry out the downstream processing of intracellular pectin oligomers. Only GH140 is specific to the RG-II degradation, suggesting that it is unlikely M. pectinilyticus can cope with the structural complexity of RG-II.

Inferring from previous studies with environmental bacteria, we predict that the pectin-binding and degradation in M. pectinilyticus are mediated through the SLH protein machinery, which to date has been mostly found in lignocellulolytic systems of various environmental isolates. While organisms with surface-associated glycan-binding proteins benefit from bringing glycan substrates in close contact with degradative enzymes [12], the exact mechanisms by which these SLH proteins may confer M. pectinilyticus with the ability to bind pectin are not yet understood. The SLH proteins are non-covalently tethered to the bacterial surface via the SLH modules which typically contain 50–60 amino acid residues that have binding affinity for the pyruvylated bacterial secondary cell wall [61]. Amongst Firmicutes, the SLH-mediated protein-anchoring motifs are used to tether a modular CAZyme structure consisting of various combinations of CAZymes, and non-catalytic CBMs, Fn3, and Big-like domains to the bacterial cell surface [30,31,32,33,34,35,36,37,38,39,40]. These non-catalytic regions of CAZymes presumably play supporting roles by functioning as CBM or a linker to swing out and orient the catalytic domains with the biomass substrates [32]. Little is known about functionally ambiguous modules such as Fn3 and Big. Fn3 domains have previously demonstrated binding affinity to insoluble lignocellulose biomass [43] while significantly enhancing the activity of the appended CAZyme domains [44]. Big-like domains were found in bacterial proteins involved in cell–cell adhesion and extracellular carbohydrate hydrolysis [44]. The CAZyme-free SLH proteins may serve as scaffolding proteins for assembling multi-protein complex, or independent carbohydrate-binding protein units. For instance, the closest cultured relatives of M. pectinilyticus, such as A. cellulolyticus [62], C. thermocellum, and Clostridium clariflavum [63] use the SLH domain-containing scaffoldin proteins to attach the cellulosome assembly on the bacterial cell surface. Recently, novel types of lignocellulolytic degradation systems consisting of large uncharacterized SLH proteins were described in Paenibacillus curdlanolyticus and Caldicellulosiruptor spp. P. curdlanolyticus produces a 1450 kDa extracellular multi-enzyme complex consisting of 11 xylanase subunits that are assembled onto a non-catalytic SLH S1 protein [64]. In Caldicellulosiruptor saccharolyticus, the largest ORF (Csac_2722) of the genome encodes an independent non-catalytic SLH protein showing binding affinity for Avicel cellulose [42]. The gene expressions of non-catalytic SLH proteins in Caldicellulosiruptor kronotskyensis were upregulated during growth on crystalline cellulose and switchgrass [32]. Lactobacillus acidophilus, which are human commensal Firmicutes, expresses an extracellular cell-attached GH13 pullulanase tethered through an S-layer associated protein (SLAP) domain [65]. The Pfam database contains several cell wall-anchoring motifs associated with S-layers, including SLH domains (PF00395), SLAP (PF03217), cell wall binding repeat 2 (CWB2; PF04122), S-layer like family, C-terminal region (S_layer_C; PF05125), and S-layer like family, N-terminal region (S_layer_N; PF05123). We showed that M. pectinilyticus produces numerous CAZyme-containing and non-catalytic SLH proteins, of which some constitute the largest ORFs of the genome and are upregulated in response to pectins, suggesting that these proteins may play roles in the mechanical and/or catalytic degradation of pectin. S-layer protein-mediated glycan degradation strategies are rarely observed in a human gut bacterium, differentiating M. pectinilyticus and L. acidophilus from the PUL system of Bacteroides, the Gram-positive PUL (gpPUL) system of the Roseburia/Eubacterium rectale group [66], or the cellulosome/amylosome organizations in Ruminococcus champanellensis [67, 68] and Ruminococcus bromii [69].

Consistent with the glycobiome specialized for pectin degradation, M. pectinilyticus possesses a narrow spectrum of metabolic pathways for utilizing uronic acids and pentose sugars of pectin/hemicellulose, and fructose. The M. pectinilyticus genome encodes all key enzymes of the variant Entner-Doudoroff (ED) pathway to catabolize the uronic acid end-products of pectin-degrading lyases, D-GalpA and DKI. Uronic acids are low ATP-yielding substrates as their key metabolic intermediate KDPG is cleaved to form one pyruvate and one G3P through the upper portion of the ED pathway, with only the latter contributing towards ATP production through substrate-level phosphorylation [70,71,72]. In comparison, hexose sugars are split to form G3P and dihydroxyacetone-P (DHAP), from which twice as many net ATPs are yielded through the Embden–Meyerhof–Parnas (EMP) pathway. The ED pathway-dependent ATP production is therefore energy limiting. Anaerobic gut bacteria that must rely on glycolysis to generate ATP in the absence of aerobic respiration often lack the ED pathway [72, 73], and rarely use uronic acids as the main energy source [20]. However, although metabolic alternatives such as the ED pathway and PKP produce less ATPs than their canonical counterparts (EMP and PPP), their metabolic pathways tend to be more streamlined, hence likely incur less protein costs. For example, in Escherichia coli, 3.5-fold less enzyme mass is needed to catabolize glucose through the ED pathway than the EMP pathway to achieve the same carbon flux [71]. Furthermore, as uronic acids are already in a highly oxidized state, cells encounter less need to consume ATP to produce electron sinks (e.g. lactate, ethanol, and butyrate) to regenerate reducing equivalents, sparing more pyruvate for ATP production via acetate synthesis to compensate for the energy loss occurring in the ED pathway [73, 74]. In the case of PKP, there are fewer enzymatic steps (8 versus 13) involved in pentose metabolism than PPP [49], possibly reducing the amount of energy and biomass required for an equivalent carbon flux. The energy saving strategies of M. pectinilyticus may illustrate how this bacterium matches its metabolic capacity to its specialized glycobiome to meet the thermodynamic demands of surviving on low energy-yielding substrates.

The highly specialized glycobiome comprising extracellular secretion of pectate/pectin lyases, RG-lyases, and pectin esterases, together with the adhesion-based colonization of PCW suggests a niche specialization of M. pectinilyticus in proximity to plant particulate matter in the bowel, particularly the HG-dense region of the middle lamellae. However, M. pectinilyticus may not necessarily be an efficient utilizer of the resulting PDPs. As implied by the scarcity of intracellular pectin-degrading CAZymes and carbohydrate-associated transporters; the narrow substrate utilization spectrum; and the presence of PDP remnants in the culture supernatant, M. pectinilyticus likely shares a significant proportion of its PDPs with other members of HGM that are less adapted for pectin degradation.

Despite certain phenotypic similarities to environmental cellulolytic bacteria, M. pectinilyticus focuses on pectin, potentially reflecting adaptation to the human gut environment. There are other indications that this gut bacterium may have evolved from an environmental ancestor, with a notable example being the enzymes of M. pectinilyticus often finding closest sequence homology with enzymes from environmental bacteria. The species-specific clustering of PL1 and CE8 family enzymes suggests that M. pectinilyticus may have evolved as a part of HGM for a sufficiently long time to allow a phylogenetic divergence from its environmental relatives. In M. pectinilyticus, the glycobiome evolution directed towards pectin utilization may have been a logical adaptation strategy to forage a readily available plant glycan from the human colon. M. pectinilyticus is currently the only representative of the genus Monoglobus and the only cultured species of a novel phylogenetic lineage of the Ruminococcaceae. The observation that uncultured bacteria found in other gut environments have similar 16S rRNA gene sequences suggests that additional cultures from this lineage exist and could be obtained by prioritizing microbial cultivation. The availability of more isolates would help to answer questions such as whether the pectin-degrading capacity is a common trait among the members of the lineage, or whether these organisms use a similar SLH protein-based carbohydrate degradation strategy to target a broader range of substrates.

In conclusion, the findings from this study shed new light on the existence of a novel phylogenetic lineage currently typified by a rare pectin-degrading specialist bacterium which possesses putative carbohydrate-associated extracellular proteins of novelty. M. pectinilyticus likely occupies a spatial niche associated with the middle lamellae and PCW, and a functional niche as a primary pectin degrader, illustrating the functional compartmentalization and ecological diversity of the HGM community. The study begun here increases our understanding of microbial pectin degradation in the human colon by presenting a possibility outside of the currently existing paradigms of gut microbial polysaccharide degradation, and also showcases a specialist bacterium which has potentially co-evolved with its host to accommodate the pectin-rich diet of humans.

Materials and methods

Phylogenetic analysis

Multiple sequence alignments were performed on CLUSTALW using default parameters using MEGA7 software [75]. Phylogenetic tree was constructed using neighbour-joining method [76]. Bootstrap values were calculated using 2000 re-samplings to evaluate the support of tree topology. Reference 16S rRNA gene sequences from type strains and cloned 16S rRNA gene sequences from uncultured bacteria were obtained from the GenBank database.

Participant selection

Forty-four healthy subjects living in New Zealand were recruited for faecal sample donation and dietary intake assessment as part of a previous human intervention study [77]. Each participant completed four sets of 3-day diet records over a period of 10 weeks. Dietary analysis was conducted using FoodWorks version 8 software (Xyris Software Pty Ltd). The Australian database in FoodWorks was used (AusBrands and AusFoods 2015 data sources) to conduct nutrient intake and food group analysis. Participants were divided into low, moderate, and high habitual dietary fibre intake groups. The high dietary fibre intake cutoffs were chosen to reflect the New Zealand recommended dietary fibre intake which is >25 g/day for females and >30 g/day for males [78]. The average dietary fibre intake in New Zealand (17.5 g/day for females and 22.1 g/day for males) [79] was chosen as the low dietary fibre intake cutoffs. The amount of pectin consumed per day by each participant was calculated using an established food composition database [80].

Quantitative PCR

Bacterial DNA was extracted from faecal samples using MoBio PowerLyzer® Powersoil DNA® isolation kit as per the manufacturer’s instructions with minor amendments. Faecal samples (0.25 ± 0.025 g) were weighed into PowerLyzer® glass bead tubes. A FastPrep-24™ 5G (MP Biomedicals) was used to homogenise the samples at a speed of 5.5 m/s for four 90 s cycles with a 60 s break between each cycle. The DNA was eluted in 10 mM Tris (pH 8.0). NanoDrop 1000 spectrophotometry was used to quantify the DNA concentration. Primers MP1087F (5′- GAGCGCAACCCTTACTGTCA-3′) and MP1581R (5′-CTCTTACTTCCGCTCTCCGC-3′) were designed using NCBI Primer Blast [81]. Samples and standards were run in triplicate by absolute quantification on Roche Lightcycler® 480 real-time PCR instrument. Lightcycler® 480 SYBR Green I Master Mix was used for specific detection of double-stranded PCR-amplified products. A 20 µl reaction contained 10 µl SYBR® Green I Master Mix; 4 µl of 2.5 µM forward primer; 4 µl of 2.5 µM reverse primer; and 2 µl of template DNA from samples and standards. Negative controls were prepared with PCR-grade sterile water in place of DNA samples. Quantitative PCR was performed using the following conditions: one activation cycle at 95 °C for 10 min; 45 run cycles of denaturation (95 °C for 10 s), annealing (60 °C for 20 s), and extension (72 °C for 20 s); one cycle of melting (95 °C for 30 s, 65 °C for 1 min, followed by 65 °C to 95 °C at 0.1 °C increment per second with continuous fluorescence acquisition); and a cooling cycle at 40 °C. Results were analysed and visualized using Lightcycler® 480 software package (version 1.5). Log10 concentration of M. pectinilyticus and the % abundance of M. pectinilyticus relative to the total bacterial concentration were plotted and participants were separated into M. pectinilyticus-positive and M. pectinilyticus-negative groups at a clear gap in the distribution at 0.01% abundance, 3.5 log10 concentration. A chi-square test was performed to check whether the proportion of high fibre consumers was similar in the M. pectinilyticus-positive and M. pectinilyticus-negative groups (8/10 compared to 14/34). Intakes of dietary fibre and pectin were compared between the M. pectinilyticus-positive and M. pectinilyticus-negative groups using a non-parametric Mann–Whitney U-test. The p-value <0.05 was considered statistically significant.

Genomic sequencing

Routine cultivation of M. pectinilyticus and genomic DNA extraction were carried out as described previously [22]. The genome of M. pectinilyticus was sequenced at Macrogen (South Korea) using Illumina HiSeq 2500. A paired-end TruSeq DNA PCR-Free (350 bp insert) library and 3 kb and 8 kb Nextera mate-pair (gel plus) libraries were generated for this genome. Sequencing data were digitally delivered in FASTQ format. A total of 11 GB of clean sequencing data were obtained, resulting in approximately 4000-fold genome coverage. The quality of raw sequencing data was assessed using FastQC [82]. Adaptor sequences present in paired-end reads were trimmed and quality-filtered using Trimmomatic at default settings [83]. Out of 33,501,036 total reads, 99.74% of paired-end reads survived the quality-filtering and were included in de novo genome assembly. NxTrim was used to trim adaptor sequences from 3 kb to 8 kb mate-pair reads and to select for sets of true mate pairs [84]. Hundred percent of 38,303,662 total reads in 3 kb mate-pair library passed the purity filter set at default parameters and were included in de novo genome assembly. Hundred percent of 36,608,300 total reads in 8 kb mate-pair library were classified as true mate-pairs and were used for additional contig extension. The de novo genome assembly was performed on SPAdes assembler using default parameters [24]. Scaffolds generated using an overlapping K-mer length of 77 were selected for further processing. Additional scaffold extension was carried out on SSPACE using the output data from SPAdes de novo assembly and trimmed sequencing data from 8 kb mate-pair library [85]. A draft genome consisting of three major scaffolds (782,032 bp, 959,713, and 1,007,898 bp in size) was constructed as a result of the initial assembly, indicating the circular genome was fragmented at three sites.

Primer walking

Primers were designed to hybridize at 200–300 bp upstream of the truncation sites on each scaffold. Long-range PCR was carried out to amplify the genome gap sequences using Phusion Green High-Fidelity DNA Polymerase (Thermo® Scientific). Fifty microliters of PCR reaction contained 10 µl of 5X phusion green GC buffer; 1 µl of 10 mM dNTPs; 5 µl of 5 µM forward primer; 5 µl of 5 µM reverse primer; 1 µl of high-quality genomic DNA; 27.5 µl of PCR-grade water; and 0.5 µl of Phusion DNA Polymerase. Optimized PCR cycling conditions recommended by the manufacture were used: initial denaturation at 98 °C for 30 s; 35 cycles of denaturation at 98 °C for 10 s and annealing/extension at 72 °C for 10 min; final extension at 72 °C for 10 min; hold at 4 °C. Blunt-end PCR products were ligated into pCRTM-BluntII-TOPO® vector using Zero Blunt® TOPO® PCR Cloning Kit (invitrogen). Six microliters of ligation reaction contained 4 µl of PCR product; 1 µl of salt solution; and 1 µl of vector plasmid. Incubation was carried out at room temperature for 30 min before proceeding to transformation of competent cells. Extracted plasmids were sequenced using M13 forward and M13 reverse primers. Using newly designed primers, primer walk sequencing of extracted plasmids was continued until forward-walking sequences overlapped with reverse-walking sequences. Sequences were aligned using Geneious software (version 10.0.3) with 65% similarity cost matrix, 12 gap open, and 3 extension penalties.

Genome annotation

A preliminary annotation of the genome was carried out using Prokka [25]. The draft genome was exported in FASTA format. Open reading frames were predicted using Prodigal, a built-in software in Prokka annotation pipeline [86]. Sequences were queried in a hierarchical manner against the default protein database (UniProt) and a series of user-specified databases using BLAST + blastp. By default, a best significant match below an e-value threshold of 10−6 was used to annotate the putative gene products. Protein function prediction was carried out using InterProScan 5 [87]. For the construction of KEGG metabolic pathways, nucleotide sequences of the target enzyme from 10 to 15 species of the family Ruminococcaceae and family Clostridiaceae were manually downloaded from GenBank. The sequences were aligned using Geneious alignment (version 10.0.3). Aligned sequences were exported into Linux environment and converted into a HMMER database using hmmbuild function [88]. Using hmmsearch function, the database was queried against the draft genome of M. pectinilyticus to find the most significant match to the full target sequence below an e-value threshold of 10−5. Genes assigned with enzyme functions were manually mapped into reference metabolic pathways stored in KEGG database [89]. In order to assign each protein-coding gene into an orthologous group, all available HMM models for the target taxonomic group (Bacteria) were manually downloaded from the eggNOG database [90]. Obtained sequences were concatenated into a HMMER database using hmmpress function. Using hmmscan function, all bacterial orthologous groups available in the eggNOG database were queried against the genome database of M. pectinilyticus. The most significant matches with the lowest e-values were used to predict the orthologous groups of each gene. The presence of signal peptides in each gene was identified using SignalP version 4.1 [91]. TMHMM server was used to predict transmembrane helix domains in proteins [92]. tRNA sequences were identified using a combination of Prokka and tRNAscan-SE [93]. Genes coding for CAZymes were identified using the CAZy database. SLH domains were identified by querying the genome of M. pectinilyticus against dbCAN database [94] and then manually verifying the results by BLAST searching against GenBank. For this, default dbCAN parameters (alignment length >80 amino acids, use e-value <10−5, otherwise use e-value <10−3) were used. Each gene entry in the genome of M. pectinilyticus was manually annotated by combining and comparing the annotation results across all databases above. Genes with no apparent matches to any known protein functions or domains were annotated as hypothetical proteins.

Genome curation and depository

The raw sequencing reads were deposited Sequence Read Archive (SRA) under the BioProject accession PRJNA383867, and BioSample accession SAMN06817956. The GenBank accession number for the draft genome of M. pectinilyticus is CP020991.

Carbohydrate analysis

Pectins from apple (93854) and citrus peels (P9135) were purchased from Sigma. Arabinan from sugar beet (P-ARAB), arabinogalactan from larch wood (P-ARGAL), galactan from potato (P-GALPOT), and RG-I from potato (P-RHAM1) were purchased from Megazyme. Hundred microliters of clarified rumen fluid, 100 µl of 5% (w/v) apple or citrus pectins, and 100 µl of 1-week-old inoculum were added in triplicate to 1.7 ml mineral medium. Cultures were grown at 37 °C for 3 days with a constant shaking. Nine hundred microliters of samples was taken out at 48 h and 72 h of incubation. These samples were centrifuged at 12,000 × g for 10 min at room temperature. Polysaccharide substrates were dissolved in 0.1 M NaNO3 (2 mg/ml), allowed to hydrate fully by standing at room temperature overnight and centrifuged (14,000 × g, 10 min) to clarify. The soluble material and samples of spent culture media (100 µL) were injected and eluted with 0.1 M NaNO3 (0.5 mL/min, 60 °C) from three columns (TSK-Gel G5000PWXL, G4000PWXL and G3000PWXL, 300 × 7.8 mm, Tosoh Corp., Tokyo, Japan) connected in series. The eluted material was detected using a refractive index monitor. The system was also calibrated with a series of pullulan molecular weight standards (6–850 kDa; Shodex, Showa Denko K.K. Tokyo, Japan). Spent culture media (100 µL) were also injected and eluted with 0.1 M NaNO3 (0.5 mL/min, 60 °C) from two Superdex Peptide (GE Healthcare) columns in series. The eluted material was detected using a refractive index monitor. The system was also calibrated with a series of pullulan molecular weight standards (6–24 kDa and the trisaccharide raffinose). For detection of oligosaccharide, spent culture media (2 µL) were injected and eluted with a simultaneous gradient of NaOH and sodium acetate (1 mL min−1) from a CarboPac PA-100 (4 × 250 mm) column. For detection of monosaccharides, spent culture media (2 µL) were injected and eluted with a simultaneous gradient of NaOH and sodium acetate (1 mL min−1) from a CarboPac PA-1 (4 × 250 mm) column. The sugars were identified from their elution times relative to standard sugar mixes.

Enzyme phylogenetic analysis

Reference PL1 and CE8 sequences were obtained as the results of BlastP queries against NCBI protein database to find the closest matches to each of PL1 and CE8 sequence from M. pectinilyticus. The catalytic domains within sequences were identified using the dbCAN database [94], and PL1 and CE8 domain sequences were manually extracted using Geneious software (version 10.0.3). Extracted sequences were aligned by ClustalW [95] and used to construct a maximum-likelihood phylogenetic tree using MEGA7 software [75]. Bootstrap values were calculated using 1000 re-samplings to evaluate the support of tree topology.

Metagenome mining

Full-length protein sequences of all SLH proteins from the M. pectinilyticus were manually extracted from the genome database. These SLH protein sequences were used as query sequences to search the metagenome databases constructed from the stool samples of 85 donors living in the US, collected as part of the Human Microbiome Project (HMP) initiated by National Institutes of Health (NIH). Only the data from the first visit was used in this study. The metagenome databases were accessed through the Integrated Microbial Genomes with Microbiome Samples (IMG/M) system of Joint Genome Institute (JGI) funded by the United States Department of Energy [96]. A built-in Blast Genome function of IMG/M system was used to BlastP sequences with an e-value threshold of 1e−5. A stringent cutoff was used to identify true positives which showed ~100% identical amino acid residue matches along the same positions over a relatively long (>100 amino acids) length of aligned protein sequences.

Preparing samples for iTRAQ quantitative protein analysis

In three-independent experiments, M. pectinilyticus was grown using three different types of substrates: 0.5% (w/v) D-fructose, 0.2% (w/v) apple pectin, and 0.2% (w/v) kiwifruit pectin. Cultures were incubated at 37 °C with a constant rotary shaking until stationary phase was reached. Fully grown cultures were centrifuged at 12,000 × g for 10 min to collect cell pellets. The total protein content of each pellet sample was assayed using the EZQ Protein Quantitation Kit (Life Technologies). Aliquots containing 25 µg of protein were taken for each sample and diluted to 100 µl with 50 mM ammonium bicarbonate. Samples were reduced by addition of DTT to 10 mM final concentration and incubated at 56 °C for 10 min. Samples were cooled and alkylated with 50 mM iodoacetamide (GE Healthcare) in the dark at room temperature for 30 min, and digested with 1 µg of sequencing grade trypsin (Promega) in a chilled microwave (CEM) at 40 °C for 2 h at 15 W power, followed by overnight incubation at 37 °C. Digests were acidified with formic acid and desalted on Oasis HLB SPE cartridges (Waters) and dried down in a vacuum centrifuge. Samples were reconstituted with 30 µl of 0.5 M TEAB (Sigma) and labelled with 4-plex iTRAQ labels (Sciex) as per the manufacturer’s instructions. Pools were prepared using equal amounts of each sample, and were then concentrated in a vacuum centrifuge to reduce solvent content, then desalted and concentrated to 40 µl as above.

LC–MS/MS and database search

Samples were injected onto a 0.3 × 10 mm trap column packed with Reprosil C18 media and desalted for 5 min at 2 µl/min before being separated on a 0.075 × 200 mm picofrit column (New Objective) packed in-house with Reprosil C18 media. The following gradient was formed at 300 nl/min using a NanoLC 400 UPLC system (Eksigent): 0 min 10%B; 50 min, 35%B; 52 min, 90%B; 55 min, 90%B; 56 min, 10%B; 60 min, 10%B, where A was 0.1% formic acid in water and B was 0.1% formic acid in acetonitrile. The picofrit spray was directed into a TripleTOF 6600 Quadrupole-Time-of-Flight mass spectrometer (Sciex) scanning from 350 to 1200 m/z for 250 ms, followed by 40 ms MS/MS scans on the 40 most abundant multiply-charged peptides (m/z 60–1200) for a total cycle time of ~2 s. The mass spectrometer and HPLC system were under the control of the Analyst TF 1.7 software package (Sciex). The resulting data from each pool were searched against a database containing the UniProt sequences for pentapetalae from August 2015 (1,895,602 entries) appended to a custom database of M. pectinilyticus entries including common contaminant sequences (2418 entries) using ProteinPilot version 5.0 (Sciex). Raw iTRAQ peak area data were processed through a combination of Paragon™ and Pro Group™ algorithms to reduce the protein inference ambiguities coming from protein modifications and to bundle peptides into winner protein groups. False Discovery Rate analysis was enabled. Search parameters were as follows: Sample Type, iTRAQ 4-plex (Peptide Labelled); Search Effort, Thorough; Cys Alkylation, Iodoacetamide; Digestion, Trypsin: FDR analysis, Yes. Using the peak height of reporter ions as a proxy for marker mass abundance, peptide ratios across 114 (apple), 116 (fructose), and 117 (kiwifruit) samples were determined in log space using 116 (fructose) values as the baseline denominators. After performing bias correction by applying a correction factor of <20% across the samples, an average ratio was calculated for each protein. The p-values were calculated to assess the possibility of random distribution of peptide ratios contributing to protein inference. Peptide and Protein summary files were exported in Excel format for further statistical analysis.

Proteomics statistics

Reliability of protein fold-changes ratios were ensured by using those proteins represented by ≥3 unique peptides with >95% confidence intervals. To ensure the statistical significance of the dataset, unused protein score of >2, and p-value of <0.05 inferred from using ProteinPilot software (version 5.0) were coupled with the quantification data to remove unreliable peptide identification results. Using these criteria, 920 proteins across three biological replicates were selected. Contaminant protein entries such as human keratin introduced through sample handling, and the traces of endogenous plant proteins from apple and kiwifruit pectin were further removed manually. Proteins represented by at least three peptides with 95% confidence, FDR-corrected p-value <0.05, and fold-changes of ≥1.20 or ≤0.83 relative to the fructose control were regarded as differentially expressed with statistical significance. The z-score for each replicate was calculated from the p-value using the NORMINV function in Excel. The mean z-score was calculated as the arithmetic average of the z-scores for the three replicates. A mean z-score value of ≥1.65 indicates that the average protein expression ratios for the three replicates lie outside the normal distribution (outermost 10% of the population). The mean relative protein expression is the geometric mean of the relative expressions in the three replicates.