Polysaccharide utilization loci of North Sea Flavobacteriia as basis for using SusC/D-protein expression for predicting major phytoplankton glycans

Marine algae convert a substantial fraction of fixed carbon dioxide into various polysaccharides. Flavobacteriia that are specialized on algal polysaccharide degradation feature genomic clusters termed polysaccharide utilization loci (PULs). As knowledge on extant PUL diversity is sparse, we sequenced the genomes of 53 North Sea Flavobacteriia and obtained 400 PULs. Bioinformatic PUL annotations suggest usage of a large array of polysaccharides, including laminarin, α-glucans, and alginate as well as mannose-, fucose-, and xylose-rich substrates. Many of the PULs exhibit new genetic architectures and suggest substrates rarely described for marine environments. The isolates’ PUL repertoires often differed considerably within genera, corroborating ecological niche-associated glycan partitioning. Polysaccharide uptake in Flavobacteriia is mediated by SusCD-like transporter complexes. Respective protein trees revealed clustering according to polysaccharide specificities predicted by PUL annotations. Using the trees, we analyzed expression of SusC/D homologs in multiyear phytoplankton bloom-associated metaproteomes and found indications for profound changes in microbial utilization of laminarin, α-glucans, β-mannan, and sulfated xylan. We hence suggest the suitability of SusC/D-like transporter protein expression within heterotrophic bacteria as a proxy for the temporal utilization of discrete polysaccharides.


Introduction
Half of global net primary production is oceanic and carried out mostly by small, unicellular phytoplankton such as diatoms [1]. Polysaccharides account for up to 50% of algal biomass [2] and can be found as intracellular energy storage compounds, as structural components of their cell walls [3], or as secreted extracellular transparent exopolymeric substances [4]. They can be composed of different cyclic sugar monomers linked by either αor β-glycosidic bonds at different positions and can be substituted by different moieties (e.g., sulfate, methyl, or acetyl groups), making them the most structurally diverse macromolecules on Earth [5].
Many members of the bacterial phylum Bacteroidetes, including marine representatives of the class Flavobacteriia, are specialized on polysaccharide degradation. They feature distinct polysaccharide utilization loci (PULs, [6]), i.e., operons or regulons that encode the protein machinery for binding, degradation and uptake of a type or class of polysaccharides. Polysaccharides are initially bound by outer membrane proteins and cleaved by endo-active enzymes into oligosaccharides suitable for transport through the outer membrane. Oligosaccharides are bound at the interface of SusCD complexes. SusD-like proteins are extracellular lipoproteins and SusC-like proteins constitute integral membrane beta-barrels termed TonB-dependent transporters (TBDTs). Glenwright et al. [7] showed that these two proteins form a 'pedal bin' complex in Bacteroides thetaiotaomicron, with SusD acting as a lid on top of the SusC-like TBDT. Upon binding of a ligand, the SusD lid closes and conformational changes lead to substrate release into the periplasm. Here, further saccharification to sugar monomers takes place that are taken up into the cytoplasm via dedicated transporters.
Besides the characteristic susCD-like gene pair, Bacteroidetes PULs contain various substrate-specific carbohydrate-active enzymes (CAZymes), such as glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), carbohydrate-binding modules (CBMs), and proteins with auxiliary functions. PULs of human gut Bacteroidetes and their capacity to degrade various land plant polysaccharides have been thoroughly investigated (e.g., ref. [8]), but knowledge on marine polysaccharide degradation is sparse. Many polysaccharides in marine algae differ from those in land plants. Green macroalgae contain ulvans, red macroalgae contain agars, carrageenans and porphyrans, brown algae contain alginates, fucans and laminarin, and diatom microalgae contain chrysolaminarin and sulfated mannans, all of which are presumably absent in land plants [9]. Likewise, many algae feature anionic, sulfated polysaccharides that require sulfatases for degradation.
A systematic inventory of the structural diversity of algal polysaccharides has not yet been achieved. We do not have a good understanding of the associated diversity of PULs in marine Bacteroidetes. Also only few PULs have so far been linked to their polysaccharide substrate. Examples include an agar/porphyran-specific PUL [10] that human gut Bacteroidetes acquired from marine counterparts [11], an alginate-specific PUL in Zobellia galactanivorans DsiJ T [12], alginate-and laminarin-specific PULs in Gramella forsetii KT0803 [13], a similar laminarin-specific PUL in Polaribacter sp. Hel1_33_49 [14], and a complex carrageenan degradation regulon in Z. galactanivorans DsiJ T [15]. Few overarching comparative genomic studies exist [14,16], focusing largely on overall CAZyme repertoires.
Pioneering studies on structural elucidation of polysaccharides from microalgae were performed [17,18], but precise microalgal polysaccharide structures remain mostly unresolved (for review, see ref. [4]), because they require sophisticated methods [19]. PUL analysis of heterotrophic bacteria co-occurring with phytoplankton could serve as an alternative starting point to advance insight into the structures of marine polysaccharides and to understand their microbial decomposition.
Here we present a comparative analysis of PULs from 53 newly sequenced Flavobacteriia isolated from the German Bight, comprising a total of 400 manually determined PULs. Based on these data we investigated whether SusCand SusD-like sequences can be linked to distinct predicted polysaccharides. Using environmental metaproteome data we show how SusC/D homolog expression may be used to assess the presence of marine polysaccharides during North Sea spring blooms.

Isolation and sequencing of North Sea Flavobacteriia
Flavobacteriia were sampled at the North Sea Islands Helgoland and Sylt as described previously ( [20,21], Supplementary Table S1). Also included were the previously sequenced Gramella forsetii KT0803 [22], Polaribacter spp. Hel1_33_49 and Hel1_85 [14], and the Formosa spp. Hel1_33_131 and Hel3_A1_48. The remaining 48 genomes were sequenced at the Department of Energy Joint Genome Institute (DOE-JGI, Walnut Creek, CA, USA) in the framework of the Community Sequencing Project No. 998 COGITO (Coastal Microbe Genomic and Taxonomic Observatory). Forty genomes were sequenced using the PacBio RSII platform exclusively, whereas eight isolates were sequenced using a combination of Illumina HiSeq 2000/2500 and PacBio RSII. All these genomes are GOLD certified at level 3 (improved high-quality draft) and are publicly available at the DOE-JGI Genomes OnLine Database (GOLD, [23]) under the Study ID Gs0000079.

Gene and PUL annotation
Initial annotations of the genomes of Polaribacter spp. Hel1_44_49 and Hel1_85 and Formosa spp. Hel1_33_131 and Hel3_A1_48 were performed using the RAST annotation system [24]. All other genomes were annotated using the DOE-JGI Microbial Annotation Pipeline (MGAP, [25]). These annotations were subsequently imported into a GenDB v2.2 annotation system [26] for refinement and additional annotations based on similarity searches against multiple databases as described previously [27].
SusC-and SusD-like proteins were annotated by the DOE-JGI MGAP, which uses the TIGRfam model TIGR04056 to detect SusC-like proteins and the Pfam models 12741, 12771, and 14322 to detect SusD-like proteins. CAZymes were annotated based on HMMer searches against the Pfam v25 [28] and dbCAN 3.0 [29] databases and BLASTp searches [30] against the CAZy database [31]. CAZymes were annotated only as such when at least two of the database searches were positive based on family-specific cutoff criteria that were described previously [32]. Selected sulfatases were annotated using the SulfAtlas database v1.0 [33]. Peptidases were annotated using BLASTp searches against the MEROPS 9.13 database [34] using the default settings of E ≤ 10 −4 .
PULs were manually detected based on the presence of CAZyme clusters, which in most cases also featured co-occurring susCD-like gene pairs as previously suggested [6]. In some cases, the sequence similarity of a TBDT was too low to be considered SusC-like, no SusD homolog was present or the entire susCD-like gene tandem was missing. These operons were still counted as PULs and are regarded as incomplete subtypes [35].

Gene expression analyses of Flavobacteriia-rich North Sea bacterioplankton using metaproteomics
During spring phytoplankton blooms of 2009 to 2012, 14 surface seawater biomass samples were collected at the long-term ecological research station 'Kabeltonne' (54°1 1.3' N, 7°54.0' E) off the German North Sea island Helgoland as previously described in detail [32,36]. Biomass was collected on 0.2 µm pore sized filters after prefiltration with 10 and 3 µm pore sized filters. Metagenome sequencing was done using the 454 FLX Ti platform for 2009 and the Illumina HiSeq 2000 platform for 2010 to 2012 samples [32].
Corresponding metaproteome analyses were performed from biomass obtained from the same water samples. Protein extraction from 0.2 µm filtered bacterioplankton biomass and separation was carried out as described previously [36] with the modification that gel lanes were cut into 10 equal pieces prior to tryptic digestion (1 µg/ml, Promega, Madison WI, USA) and subsequent mass spectrometric detection in an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher, Bremen, Germany). The mass spectrometry proteomics data have been deposited to the Proteo-meXchange Consortium via the PRIDE partner repository [37]; data set identifiers: PXD008238, 10.6019/ PXD008238.
Mass spectrometric data were analyzed using Sequest v27r11 (Thermo Fisher Scientific, San Jose, CA, USA). Searches were carried out against a forward-decoy database of all proteins from all metagenome samples combined. This non-redundant database was constructed from all predicted protein-coding genes of all metagenomes (6,194,278 sequences) using the uclust option of USEARCH v6.1.544 [38]; options: cluster_fast; nucleotide identity 0.99; maxhits 5; maxrejects 30) and contained 3,212,324 sequences. Common laboratory contaminants were included in all databases. Technical duplicates of each sample were searched together (including all 20 subsamples) to obtain averaged spectral counts. Validation of protein-and peptide identifications was performed with Scaffold v4 (Proteome Software Inc, Portland, OR, USA) using the parameters previously described [36], and normalized spectral abundance factors (%NSAF) were calculated [39] to allow for semi-quantitative analyses (Supplementary Table S2). The NSAF quantitation measure is commonly used in non-gel-based label-free shotgun proteomics. In brief, a %NSAF of 1 corresponds to 1% of all mass-adjusted spectral count data in a given proteomic experiment.

Results
High genomic and phylogenetic diversity in isolated marine Flavobacteriia The 53 flavobacterial isolates cover a broad range of the Flavobacteriia class within the phylogenetic tree based on full-length 16S rRNA genes (Fig. 1). The strains fall into several clusters that can be linked to characteristic genomic features (Supplementary Table S1). Genome sizes ranged from 2.02 Mbp (Formosa sp. Hel3_A1_48) to 5.98 Mbp (Aquimarina sp. MAR_2010_214), with an average of 3.83 Mbp. One of the clusters was dominated by isolates obtained from the retentates of seawater filtered through 20 µm particle nets (8 out of 12; Fig. 1; Supplementary  Table S1). These species feature mostly larger genomes (average 4.5 Mbp) and are likely associated with microalgae. Forty-seven of the 53 strains have two to four 16S rRNA operons, with the notable exception of the three Tenacibaculum strains possessing six (strains MAR_2009_124 and MAR_2010_205) and seven (strain MAR_2010_89), respectively.
The capacity of the isolates to degrade polysaccharides varied widely as indicated by the number of degradative CAZymes per Mbp and predicted PULs per genome. On average, we identified 7.5 PULs per genome and 55 degradative CAZymes (Supplementary Table S1). Strains of the putative microalgae-associated cluster differed with on average 83.3 degradative CAZymes, almost twice as many PULs per genome (14.2) and many sulfatase genes, indicating an extended capacity for the degradation of sulfated polysaccharides (average of 28.2 sulfatases, with a maximum of 95 sulfatases in Zobellia amurskyensis MAR_2009_138). The other strains had an average of 46.8 degradative CAZymes and 5.5 PULs. Eleven isolates possessed less than three PULs, contained few (≤ 3) or no sulfatases and were exclusively isolated from surface seawater or pore water. They likely target rather simple, nonsulfated polysaccharides and peptides. This strategy is emphasized by their high peptidase:CAZyme ratio of 1.81, compared with an average ratio of 0.95 for isolates with > 10 PULs. Still it is noteworthy that numbers of PULs and degradative CAZymes varied considerably, even within isolates of the same genus.

Putative substrate specificities
The 53 genomes revealed a wide range of as yet undescribed PULs. In total, 400 PULs were annotated, 259 of which could be linked to either dedicated polysaccharides or polysaccharide classes by in-depth annotations (Supplementary Table S3).
Variant B is a larger, more variable PUL (Fig. 2b). It shares homology with a PUL in Polaribacter sp. Hel1_33_49 that can be induced by laminarin [14]. This PUL additionally features a predicted GH30 exo-β-1,6glucanase and at least two GH17 β-1,3-glucan hydrolases with predicted endo-and exo-activities, respectively. The  Polaribacter sp. Hel1_33_96 Polaribacter sp. Hel1_33_78 Polaribacter sp. Hel1_33_49 Polaribacter sp. Hel1_85 la m in a r in tr e h a lo s e d ig e n e a s id e a lg in a te s ta r c h , g ly c o g e n α -m a n n a n α -g a la c to s e c a r r a g e e n a n fr u c to s e c h it in β -m a n n a n β - GH30 exo-β-1,6-glucanase removes β-1,6-glucose side chains from laminarin [45]. Although GH16 enzymes can hydrolyze both β-1,3and β-1,4-linked glucans, GH17 glucan hydrolases are highly specific to undecorated β-1,3 glucans and can have endo- [46] and exo-activity [47]. The β-1,3-glucan endohydrolase thus likely cleaves laminarin into oligosaccharides, which may be further degraded into glucose by the β-1,3-glucan exohydrolase. Variants C and D PULs are likewise predicted to be capable of laminarin degradation based on gene content but have not been described before (Fig. 2c and d). They feature an additional putative GH5 glucan hydrolase with a carbohydrate-binding domain that binds β-1,3and β-1,4glucans (CBM6c, [48]). They furthermore contain GH16 and GH30 family enzymes as described in variant B, but no GH17 enzymes.
In total, 62% ( In contrast, laminarin PULs were far less prevalent in isolates obtained from the >20 µm retentate (2/12). Laminarins are composed of a β-1,3-glucan backbone ramified by β-1,6 and, less frequently, β-1,2-linked glucose side chains [49]. The backbone length and ramification degree varies in different species. Laminarin of brown algae is capped at the reducing end by a 1-linked D-mannitol [50]. Only three isolates with laminarin PULs, namely the Polaribacter spp. Hel1_85 and KT25b and Gramella sp. MAR_2010_102, also possessed an annotated mannitol-2-dehydrogenase. It is possible that this enables utilization of brown algal laminarin. However, free mannitol is a more likely substrate. Growth on free mannitol has for example been demonstrated in the marine flavobacterium Z. galactanivorans [51]. Studies on Ectocarpus siliculosus have shown that brown algae can store substantial amounts of free mannitol as compatible osmolyte [52]. Furthermore it has recently been shown that free mannitol is likewise frequently found in various planktonic microalgae [53]. Interestingly, diatoms seem to have lost their ability to synthesize mannitol, although exceptions exist [53]. The fact that phytoplankton blooms in the southern North Sea are usually diatom- Fig. 2 Conserved PULs known (a, b) and predicted (c, d) to target laminarin dominated would hence explain, why mannitol-2dehydrogenase genes were rarely found in our isolates. Consequently, the majority of isolates with laminarin PULs seem to only target diatom-type non-mannitol-capped chrysolaminarins, indicating that these are the major available laminarins in the southern North Sea.

Trees of SusC-and SusD-like proteins reveal substrate-specific clusters
We computed trees for all SusC-and SusD-like protein sequences of the 400 isolate PULs and obtained pronounced clusters for many of the predicted polysaccharide substrates (Fig. 6). For clarity, functionally heterogeneous or undefined clusters are depicted as gray triangles (complete trees: Supplementary files 1, 2). Well-defined clusters in both trees included the structurally simple polysaccharides laminarin, α-1,4-glucans and alginate. For example, SusDlike proteins of laminarin-  Table S4), whereas identity to SusD-like sequences from other PULs within the same respective genome was only 10-25% (data not shown).
SusC/D-like proteins from conserved PULs for these structurally simple substrates were more closely related than those from more variable PULs targeting structurally more diverse substrate classes such as FCSPs or xylose-rich substrates (Supplementary Table S4). This is visible in the trees by shorter and longer respective branch lengths (Fig. 6). Some substrates formed multiple clusters, for example xylose-rich substrates. This might indicate either rather different xylose-containing substrates or multiple ways of attack and uptake for a given class of xylosecontaining substrate.
The topologies of the SusC-and SusD-like protein trees were notably congruent regarding branching patterns of the identified substrate-specific clusters. Only the pectin cluster was located at a distinctly different position. SusC-and SusD-like proteins from the same PULs exhibited a strong tendency to occur in corresponding substrate-specific clusters in both trees. This applied to > 70% of the SusC and SusD sequences within identified substrate-specific clusters (Supplementary Figures S1A and B).

SusC/D-like protein expression of bacterioplankton during phytoplankton blooms supports temporal variations of polysaccharide abundances in situ
SusC/D-like proteins range among the highest expressed proteins in bacterioplankton metaproteomes from productive oceans [36,75,76]. Likewise, studies on flavobacterial isolates have identified SusC/D-like proteins as the highest expressed proteins within PULs that are furthermore coregulated with other PUL-encoded proteins including CAZymes [13,14]. SusC/D expression thus represents a suitable proxy for overall PUL expression We monitored bacterioplankton spring phytoplankton blooms in the southern North Sea during 2009 with weekly, and in 2010 to 2012 with about monthly sampling [32,36]. At 14 selected time points we analyzed the free-living 0.2-3 µm bacterioplankton using shotgun metaproteomics (total: 23,917 identified proteins), and detected high numbers of expressed SusC/D-like proteins in metaproteomes across all sampled years (Supplementary Table S2).
To identify potential substrates, we aligned all expressed SusC/D-like sequences (SusC: 390; SusD: 118) to the SusC/ D-tree constructed from isolate PULs. Isolate sequences with highest similarities (≥ 40%) to expressed sequences are indicated in Fig. 6. Further semi-quantitative analyses were confined to SusC/D-like proteins where at least one related homolog reached expression levels of ≥ 0.05 %NSAF, i.e. 0.05% of all mass-adjusted spectral counts (see Materials and methods; Fig. 7). a b Fig. 6 Trees of all PULassociated SusC-(a) and SusDlike (b) proteins of the Flavobacteriia isolates showing functional, substrate-specific clustering. Protein sequences were aligned using the MAFFT G-INS-i algorithm and trees were calculated using FastTree 2.1.5 approximate-maximum likelihood (SusC-like: 370; SusD-like: 362). Substrate predictions are depicted in colors. Proteins with expressed homologs in North Sea bacterioplankton blooms of more than 40% sequence identity are marked with asterisks (and number of homologs if x > 1). Corresponding figures labeled with protein sequence identifiers, originating species and PUL-associated CAZymes are provided as supplementary material

Laminarin
Homologs to laminarin-binding SusC-like proteins were detected amidst the 2009 and 2010 phytoplankton blooms, with one homolog reaching a notable maximum of 0.13 %NSAF on May 4th, 2010 (Fig. 7b). Respective SusD-like homologs were detected in the same years and highest expression was observed at the same date in 2010 (0.07 %NSAF, Fig. 7d). Amino-acid identities of expressed SusC homologs and laminarin PUL SusC-like proteins from isolates ranged from 48-68%, and for SusD homologs from 40-78% (Supplementary  Table S4). Our data suggest that laminarin occurred at the bloom peaks in 2009 and 2010 and directly thereafter. This is supported by detection of expressed GH3 β-glucosidases and GH16 β-glucanases in 2009 [36] and, to a lesser degree, in 2010 (Supplementary  Table S2). Chrysolaminarin is produced by microalgae such as Thalassiosira nordenskioeldii diatoms or representatives of Phaecystis haptophytes [77,78], which both were among the dominating microalgae in 2009 and 2010 [32].

Alpha-1,4-glucan
Respective SusD-like proteins were most abundantly detected in 2010, peaking on April 8th (0.11 %NSAF, Fig. 7d), but also in 2009 and 2011. Sequence identities to Fig. 7 a, b Trees of expressed SusC and SusD-like proteins identified in 3-0.2 µm bacterioplankton during North Sea spring phytoplankton blooms in 2009-2012 using proteomics. The most closely related SusC/D-like sequences from North Sea Flavobacteriia isolates in this study were integrated in the tree. Protein names correspond to sequence identifier and isolate name. Sequences were aligned using the MAFFT G-INS-i algorithm. The tree was calculated using FastTree 2.1.5 approximate-maximum likelihood. c, d Corresponding expression levels as Normalized Spectral Abundance Factors (%NSAF) for the four consecutive blooms. Metaproteomic samples were classified as pre-, early-, mid-, and latebloom based on chlorophyll a concentrations during the spring phytoplankton blooms. Expression levels are highlighted by green color isolate α-1,4-glucan PUL SusD-like proteins ranged from 43 to 46% (Supplementary Table S4). These data indicate that α-1,4-glucans, potentially starch or glycogen, represented a recurring substrate from 2009 to 2011 during early to late phytoplankton bloom stages.

Sulfated β-xylan
One SusC and two SusD-like proteins likely targeting a sulfated β-xylan were expressed in the mid and late stages of the phytoplankton bloom of 2009, peaking at 0.08 % NSAF for SusC-like proteins and 0.07 %NSAF for SusDlike proteins. Their identities to homologs of the sulfated βxylan PUL of Formosa sp. Hel3_A1_48 (Fig. 5b, PUL136, Supplementary Table S3) was 53% and 49-53%, respectively (Supplementary Table S4).

Beta-mannan
Homologs with high identities to SusC/D-like proteins occurring in a predicted β-mannan PUL from Muricauda sp.  Table S4). The predicted β-mannan PUL of Muricauda sp. MAR_2010_75 harbors two pairs of SusC/D-like proteins. The one with expressed in situ homologs did not cluster with those from other betamannan PULs in our SusC/D trees. Hence the two SusC/Dlike pairs might target different oligosaccharides. As some PULs can be induced by substrates other than those that they degrade, it is possible that the substrate that led to the upregulation of the in situ homologs was not a betamannan. Proteomic studies of this PUL in Muricauda sp. MAR_2010_75 are required to clarify regulation of this PUL and to interpret the in situ data.

Arabinan
SusD-like proteins potentially targeting an arabinan were expressed at late phytoplankton blooms stages during all four years with at least 0.05 %NSAF. However, their identities to the SusD-like protein of a predicted arabinan PUL from Muricauda sp. MAR_2010_75 were only 26-30% (Supplementary Figure S2I, PUL267, Supplementary Tables S3 and S4).
In summary, comparative analyses of SusC/D homolog expression are indicative of a successive utilization of different polysaccharides over the course of phytoplankton blooms. This agrees with successive changes in the microbial community composition during bloom events that we reported earlier on [32,36].

Discussion
PUL function predictions in this study are based on sequence similarity analyses and thus cannot rival timeconsuming laboratory-based functional studies in terms of accuracy. Knowledge on polysaccharides from marine algae, in particular from microalgae, is still sparse and thus false predictions are possible. Still, the holistic approach to analyze the PUL spectrum of a large number of isolates from a single habitat allows identification of recurrent and thus important PULs as targets for future functional studies and to build testable hypotheses on possible substrates.
We observed diverse polysaccharide degradation capacities among North Sea Flavobacteriia with no distinct correlation to taxonomy. Even isolates from identical genera often featured notably diverging PUL repertoires and genome sizes (e.g., Polaribacter, Maribacter, and Cellulophaga), substantiating earlier data [14]. Our findings suggest that a species' PUL repertoire is more dependent on its distinct ecological niche, whereas its phylogeny is of secondary importance. This corroborates the hypothesis that PULs are exchanged between Flavobacteriia through horizontal gene transfer [10].
The isolates' PUL repertoires showcase that abundant, structurally simple substrates such as laminarin, α-1,4-glucans, and alginate are targeted by likewise conserved and frequent PULs. These substrates are likely so common that preserving the respective catabolic machinery is favorable for many marine Flavobacteriia. Diatom-derived chrysolaminarin has been estimated to amount to 5-15 petagrams of organic carbon annually [78] and accordingly laminarin-specific PULs were frequent in our surface water isolates. The four predicted laminarin PUL variants we identified might indicate that different laminarin types [49] are targeted by different PULs or that some of these PULs act as helper modules in laminarin degradation, as many species feature more than one laminarin PUL type (e.g., Formosa spp. Hel1_33_131 and Hel3_A1_48, Gramella sp. MAR_2010_102). Variant B contains predicted endo-and exo-acting β-1,3-glucan hydrolases (GH17) highly specific to laminarin degradation [45]. Variants A, C, and D only contain GH16 endo-1,3(4)-β-glucanases and may not be restricted to laminarin, but are potentially capable of degrading further mixed-linkage β-1,3/1,4-glucans, as recently shown for a similar conserved PUL in human gut Bacteroidetes [44]. Clustering of the SusC/D sequences of variants A, B and D in the SusC/D trees support that they bind the same substrate (Supplementary Figure S1A, B). Those of variant C, however, are located elsewhere, indicating that this PUL might indeed have an alternate function. Functional studies on model strains containing variant C (e.g., Formosa sp. Hel3_A1_48) and D (e.g., Gramella sp. MAR_2010_102) will be necessary to ultimately elucidate the functions of these PULs.
Alginate and α-1,4-glucan degradation capacities were prevalent in the isolates obtained from the > 20 µm retentate, which might be microalgae-associated, but were also common in many seawater isolates. Overall, laminarin, α-1,4-glucan, and alginate PULs are fairly conserved and make up over a quarter of all PULs in the isolates (115/ 400), suggesting that these are abundant polysaccharide substrates in North Sea coastal habitats that many microbes can consume and likely compete for.
A major result of this study is the substrate-specific clustering of both SusC-and SusD-like proteins. The strong tendencies of SusC and SusD homologs to occur in corresponding substrate-specific clusters in both trees, resulting in similar tree topologies, suggest coevolution of these two proteins. This hypothesis is corroborated by recent X-ray crystallography findings showing complex formation of two SusC-and SusD-like proteins of B. thetaiotaomicron [7]. Clustering was more pronounced for structurally conserved, simple polysaccharides than for the heterogeneous and partially new substrates described in this study. This is expected, as heterogeneous substrates are attacked at multiple points resulting in a variety of structurally different oligosaccharides for uptake. Furthermore, broad substrate classes that currently can only be defined as, e.g., FCSPs or xylose-containing substrates might actually represent multiple chemically rather different substrates. Hence, improvement of functional clustering is to be expected once more detailed knowledge on algal polysaccharides structures is available.
We here provide first metaproteomic data indicating that high-resolution expression analysis of SusC/D homologs may be used for monitoring changes in microbial polysaccharide degradation activity. This provides a proxy on which polysaccharides are important at a given time and space in marine carbon cycling. Considering our still incomplete knowledge, only expressed SusC/D homologs exhibiting a high level of sequence identity to functionally annotated or characterized SusC/D sequences should be considered. Absence of such expressed homologs, however, does not preclude that a respective substrate may be targeted by an as yet unknown SusC/D system. This current limitation notwithstanding, our approach provides a new method to identify environmentally relevant polysaccharide substrates that due to their structural complexity are still difficult to identify by direct chemical analysis.

Compliance with ethical standards
Conflict of interest The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.