The enormous chemical diversity and strain variability of prokaryotic protein glycosylation makes their large-scale exploration exceptionally challenging. Therefore, despite the universal relevance of protein glycosylation across all domains of life, the understanding of their biological significance and the evolutionary forces shaping oligosaccharide structures remains highly limited. Here, we report on a newly established mass binning glycoproteomics approach that establishes the chemical identity of the carbohydrate components and performs untargeted exploration of prokaryotic oligosaccharides from large-scale proteomics data directly. We demonstrate our approach by exploring an enrichment culture of the globally relevant anaerobic ammonium-oxidizing bacterium Ca. Kuenenia stuttgartiensis. By doing so we resolve a remarkable array of oligosaccharides, which are produced by two seemingly unrelated biosynthetic routes, and which modify the same surface-layer protein simultaneously. More intriguingly, the investigated strain also accomplished modulation of highly specialized sugars, supposedly in response to its energy metabolism—the anaerobic oxidation of ammonium—which depends on the acquisition of substrates of opposite charges. Ultimately, we provide a systematic approach for the compositional exploration of prokaryotic protein glycosylation, and reveal a remarkable example for the evolution of complex oligosaccharides in bacteria.
Post-translational modifications modulate protein properties in response to the environment at very short time scales . Thereof, protein glycosylation is fundamental to all three domains of life and one of the most abundant modification out of the many reported to date [2, 3]. Moreover, glycoproteins are also found in protective envelopes of most pathogenic viruses . Carbohydrates, the building blocks of glycans, are the least conserved class of molecules and, when linked to proteins, considerably increase the proteome diversity. Consequently, due to glycosylation and the many other types of modifications, the number of proteoforms is orders of magnitude larger than what can be translated from the genomic sequence of an organism alone [4, 5].
Apart from well-investigated pathogens or easily culturable model archaea, such as C. jejuni or H. volcanii, our understanding of the biological significance and evolutionary forces shaping glycan structures is highly limited. This lack of understanding is largely due to the general paucity of large-scale approaches capable of resolving the chemical diversity expressed by prokaryotes. In eukaryotes, protein glycosylation utilizes a well-defined number of some 10 monosaccharides (SI-DOC-Table-S1). Thereby, the existence of distinct glycosylation systems and oligosaccharide structures evolved as ubiquitous for many molecular processes, such as protein folding, stability and immunity [5,6,7]. In contrast, biosynthetic routes in prokaryotes have a much higher diversity [5, 8]. Oligosaccharide chains show large variations in regard to sugar and linkage chemistry across species and strains [8, 9]. Interestingly, many functions protein glycosylation serves in eukaryotes, such as for protein folding, quality control or intracellular trafficking, do not apply to prokaryotes [10, 11]. Moreover, glycan biosynthetic routes involve many enzymatic steps and are, particularly for prokaryotes, an energetically costly process. Therefore, those structures must serve different, but likely equally important roles [12, 13].
Prokaryotes account for a substantial proportion of Earth’s biomass, drive many global processes such as the nitrogen cycle or methane production, and directly impact human health (e.g., via the gut microbiome) [14, 15]. One of these globally wide spread microorganisms are anaerobic ammonium-oxidizing (anammox) bacteria, a phylogenetically deep branching group within the Planctomycetes phylum. Anammox bacteria play a key role in the bio-geochemical nitrogen cycle (e.g., producing over two thirds of atmospheric N2) and are the cornerstone of new, sustainable and resource-efficient biotechnologies (e.g., for wastewater treatment) [16,17,18]. Members of the Planctomycetes phylum display cellular characteristics that are unique and strikingly homologous to those found in eukaryotes, making them of paramount interest for investigating fundamental aspects of bacterial cell physiology [19, 20]. Only the (finally) confirmed existence of a peptidoglycan layer supported their classification as Gram-negative bacteria [21, 22]. Pioneering work by van Teeseling et al. furthermore revealed the existence of a glycosylated cell surface layer in planktonically growing anammox bacterium Ca. Kuenenia stuttgartiensis [23, 24]. Moreover, Boleij and co-workers reported on a glycosylated surface layer protein in the closely related genus Ca. Brocadia sapporoensis . In many archaea and bacteria, surface layer proteins are expressed, and often further modified through post-translational modifications such as glycosylation [26, 27]. Amongst many potential roles, glycans may modulate the surface layer properties in regard to structure, hydrophilicity, provision of functional groups, shielding or may increase protection [10, 28,29,30]. Interestingly, ammonia-oxidizing archaea (AOA) were found to express highly acidic surface layer proteins (SLP), which are supposed to support the acquisition of positively charged ammonium, the substrate, which is further converted into nitrite through the nitrification process . In the context of the unique cellular appearance of anammox bacteria and the analogy to AOA, which both depend on the uptake of ammonium, there is the question on the molecular level characteristics and potential roles of the glycosylated surface layer recently discovered in anammox bacteria.
Historically, protein glycosylation in prokaryotes was initially predominantly associated with pathogens (and commensal bacteria), many of which display complex oligosaccharide structures, including highly specialized sugars such as nonulosonic acids (NulOs, where the NulOs found in animals are commonly referred to as sialic acids) . These sugars, in addition to cellular protection, have been shown to play key roles during pathogenicity and host invasion, such as in C. jejuni, which is one of the most common causes of food related infections globally [33,34,35]. Nevertheless, protein glycosylation is also found in non-pathogenic bacteria and archaea, commonly modulating surface layer proteins (SLPs), pili or flagella [9, 10, 35]. This has been shown to introduce remarkable properties, such as for the pilus glycosylation of the thermoacidophile Sulfolobus islandicus, which provides the physical resistance to survive under extreme conditions . Nevertheless, most likely due to the lack of large-scale data, there was a long standing belief that glycosylation in (non-pathogenic) prokaryotes mediates structural, rather than functional, properties  However, there is an increasing volume of literature demonstrating that glycans play multiple roles in all different types of bacterial systems [37, 38].
From an analytical perspective, the enormous chemical and structural diversity expressed in prokaryotes makes large-scale exploration exceptionally challenging. High-performance (glycan database utilizing) approaches [39,40,41] cannot be readily applied to prokaryotic glycoproteomes with unknown oligosaccharide chemistry. Therefore, the glycoproteomic exploration of novel prokaryotic strains often still employs protein fractionation and carbohydrate-specific colorimetric staining . The target proteins and associated carbohydrate structure(s) are then commonly identified following proteolytic digestion via mass spectrometric approaches. Thereby, peptides are sequenced and investigated for spectra sharing peptide and oligosaccharide fragments [23, 25, 32, 42]. This approach takes advantage of the commonly observed monosaccharide fragments (oxonium ions) to differentiate unmodified peptides from glycopeptide spectra. However, this multi-step, manual procedure requires extensive experience and is extremely time consuming. Furthermore, the oxonium markers, supporting (automated) identification for eukaryotic proteins, are not necessarily part of the prokaryotic carbohydrate chains. Therefore, this strategy is heavily operator biased and may leave a large number of oligosaccharide modifications unidentified.
Recently, open modification search approaches, which match fragment ions to a protein sequence database without prior consideration of the intact peptide mass, have advanced the search for unexpected peptide modifications [43,44,45,46]. Very recently, this approach has been combined with glycopeptide enrichment to analyze a range of pathogenic bacteria . Thereby, the enrichment improves spectral coverage, reduces the search space and qualifies observed mass-shifts as carbohydrate-like modifications. Moreover, an additional ion mobility interface (FAIMS) showed promising outcomes even without prior enrichment procedures . Those developments have significantly advanced the identification of oligosaccharide-related modifications in prokaryotic proteomes. However, until now, the large-scale discovery of prokaryotic protein glycosylation remains dependent on specific enrichment procedures (by HILIC or specialized equipment), extensive fragmentation experiments (to ensure peptide sequence coverage), and proteome sequence database availability. The number of sequencing spectra and the database volume, furthermore, impact computational efforts and sensitivity. Nevertheless, common large-scale glycoproteomics approaches have been also established for the application to pure cultures. Most importantly, however, those approaches do not establish the strain-specific sugar components or additional oligosaccharide sequence information. Therefore, the untargeted large-scale exploration, which provides additional chemical and compositional information to support the physicochemical interpretation of the oligosaccharide chains—in particular when exploring non-pure or enrichment cultures—remains a bottleneck in microbial proteome research.
We established a systematic procedure to analyze yet-unexplored prokaryotic protein glycosylation directly from large-scale proteomics data. By this means, we first determine the strain- (or enrichment)-specific sugar components and subsequently the corresponding protein-linked oligosaccharide chains of individual strains in the culture (community). When investigating an enrichment culture of the globally relevant anammox bacteria, the new approach resolved a remarkably complex array of surface-layer oligosaccharides generated by two seemingly unrelated biosynthetic pathways simultaneously. The physicochemical interpretation of the observed array ultimately suggests an evolutionary link to charged surface layer proteins (SLPs) and the requirements of the metabolic lifestyle in anammox bacteria.
Results and discussion
The general microbial glycoproteomics approach
Establishing strain-specific monosaccharide components from proteomics data, the first step in our approach, takes advantage of the uniform fragment-ion patterns obtained when mass binning thousands of individual peptide sequencing spectra from shotgun proteomic data (Fig. 1a–b, SI-DOC-Fig-S1 and SI-EXCEL-T1). The mass patterns are highly comparable across species because they reflect the general amino acid repertoire used throughout all domains of life. The low mass range is dominated by mostly intense fragments of the canonical set of amino acids and combinations thereof (mono, di- and trimers). Carbohydrate-related signals, however, typically lay outside the amino acid fragment mass space due to their distinct chemical composition (SI-DOC-Fig-S1–2, SI-EXCEL-T1–2). Previously, mass binning had been proposed to search for stable amino acid modifications . Here, we explore the out-of-range peaks from high-resolution data in search of (novel) carbohydrate-related signals . Thereby, candidates are analyzed using a constructed sugar composition database, containing more than 3300 theoretical chemical sugar compositions, and by investigating for additional water loss and co-occurrence within sequencing spectra (SI-DOC-Fig-S1–2, SI-EXCEL-T3–5). To provide maximum differentiation from closely related compositions of peptide fragments, this step employs shotgun proteomic data for which the fragmentation spectra were acquired at very high mass resolution (140 K at m/z 200). Moreover, to provide in-depth chemical information from novel sugar derivatives, selected components (such as NulOs) were analyzed by additional in-source fragmentation experiments (SI-DOC-Fig-S3).
Because prokaryotes often produce species-unique carbohydrate derivatives, this strategy is highly advantageous as opposed to relying only on oxonium ion markers known from eukaryotic glycoproteins (e.g., HexNAc, Hex or NeuAc).
The second step—termed “parent offset binning”—establishes the oligosaccharide chains as modifying individual proteins (Fig. 1c). Thereby, the peak mass lists of fragmentation spectra that contain carbohydrate components—and that (likely) derive therefore from oligosaccharide modified peptides—are now subtracted from the parent peptide mass. This approach generates mass deltas starting with the parent peptide mass at zero. Thereby, fragmentation spectra from glycopeptides will (repeatedly) show numbers consistent with the mass of the peptide-linked oligosaccharide chain(s). This process takes advantage of the predominant fragmentation of the carbohydrate chain over the peptide backbone when performing fragmentation by collision-induced dissociation, such as by higher-energy collisional dissociation (HCD) specific to Orbitrap mass spectrometers [50,51,52,53].
Finally, binning parent offset mass deltas—that originate from spectra containing the same carbohydrate fragments—will provide a histogram of the oligosaccharide chain mass(es) linked to the peptides. This procedure relies on the presence of the Y0 - ion in the fragmentation spectrum (the unmodified, intact peptide fragment peak). Due to the strong fragmentation of the carbohydrate chains, however, this peak is very commonly observed in glycopeptide (CID/HCD) fragmentation spectra [50, 52, 53].
Moreover, owing to the oligosaccharide backbone fragmentation, binned spectra contain additional oligosaccharide fragments, therefore providing sequence and in some cases even linkage-type (N/O) information [50, 53]. Although prokaryotic protein glycosylation shows a large species and strain variability, the structural heterogeneity within one proteome is, generally, comparatively low. Hence, the binning approach is a very useful procedure, particularly for prokaryotic glycoproteomes. In addition, establishing the sugar components or oligosaccharide profiles does not require a proteome sequence database. Therefore, the data processing pipeline requires only a few minutes data processing time, e.g., when applied to single-run QE Orbitrap shotgun proteomics data. Finally, the thereby-established oligosaccharide profiles provide a database for recently developed high-performance glycopeptide annotation tools, [40, 41] or can be simply integrated as variable modifications into (multi-round) search approaches to identify modified proteins and target strains (Fig. 1d). While only the intact oligosaccharide mass will provide a confident match during this procedure, determination of the free peptide mass in sugar-fragment-containing spectra further enables to differentiate between the intact oligosaccharide chain and additional fragments. Ultimately, the variable modification search provides the common statistical parameters, such as peptide scores and false discovery rates. Most importantly, however, the determined chemical and compositional information guides physicochemical interpretation of the oligosaccharide chains, and the exploration of biosynthetic routes or phylogenetic relations.
An unexpectedly complex array of sugars
To verify the developed “sugar-miner” approach for establishing sugar components directly from shotgun proteomics data, we processed a set of well-characterized control samples. In so doing, every control proteome sample showed sugar components reflecting the known oligosaccharide structures of the respective species (Fig. 2a, SI-EXCEL-T6). Moreover, the theoretically constructed carbohydrate composition space, used to match carbohydrate fragments in the mass binning data, only marginally overlapped with peptide related fragments (e.g., those found in the E. coli comparator proteome). One of the few observed coincidences, however, included the compositional overlap of peptide-related fragments with the sugar composition of bacillosamine (diNAcBac) (Fig. 2b, SI-EXCEL-T1–2, SI-DOC-Fig-S1). Furthermore, we processed all prokaryotic control samples also through the developed “parent offset binning” procedure—which provides the oligosaccharide chains—to confirm the generic nature of this approach (SI-DOC-Fig-S4–9).
When exploring the sugar components found in the Ca. Kuenenia stuttgartiensis enrichment proteome, we observed a surprisingly diverse repertoire of carbohydrate fragments, including yet-undescribed nonulosonic acid derivatives as well as seven-carbon sugars rarely observed in glycoproteins (Fig. 2a, c, SI-EXCEL-T6, SI-DOC-Table-S2–6 and SI-DOC-Fig-S10) . In addition, when investigating the related oligosaccharide profiles using the above-described parent ion offset approach, we observed two completely unrelated types of oligosaccharide chains (Fig. 3a–d, SI-DOC-Fig-S11–15). One type of oligosaccharides resembled the recently described N-acetyl-hexoseamine core, albeit containing nonulosonic acids not resolved in an earlier study (complex type structures; “X-type”) . The second type of oligosaccharide structures consisted of homogeneous heptose chains (oligo-heptosidic; “O-type”). More surprisingly, when integrating the oligosaccharide masses into a (multi-round) variable modification search using a metagenomics constructed database, both types of oligosaccharide structures were exclusively matched to the same surface layer protein (SLP) of Ca. Kuenenia stuttgartiensis (Fig. 4a–b, SI-EXCEL-T7-10, SI-DOC-Table-S7–8 and SI-DOC-Fig-S16–27), which was further confirmed by manual investigation of selected spectra (SI-DOC-Fig-S18–20). Moreover, while both oligosaccharides appeared to be O-linked, and the complex type glycan likely follows the attachment motif GT/S (SI-DOC-Fig-S21), no potential glycosylation motif could be deduced from the amino acid sequences of the oligo-heptosidic modified peptides. However, to unambiguously identify glycosylation motifs, further fragmentation experiments using electron-transfer dissociation (ETD) are required, because the employed higher energy collision dissociation (HCD) does not provide information on the modified amino acid residue(s). Interestingly, to the best of the author’s knowledge, a comparable complexity of unrelated glycans targeting the same SLP simultaneously has only been observed before for archaea [9, 27]. Whether these glycans indeed involve different oligosaccharyl-transferases (O-Tase), or only one O-Tase, or an additional consecutive transfer, can only be determined by performing mutagenesis or heterologous expression experiments. This is in particular important, because it has been shown previously that glycans with rather different compositions can be transferred via a single O-Tase to the same target protein [55, 56].
Furthermore, when comparing these results to a laboratory enrichment of Ca. Brocadia sapporoensis, another anammox species within the Candidatus Brocadiaceae family, a similar carbohydrate profile was observed, albeit lacking the nonulosonic acid-related fragments found in Ca. Kuenenia (SI-EXCEL-T6 and T11, SI-DOC-Fig-S22). Furthermore, the observed sugar components established at least 4 different types of oligosaccharide chains (SI-DOC-Fig-S23–26, SI-DOC-Table-S3). One was identical to a recently reported HexNAc core oligosaccharide (204 m/z) , another oligosaccharide structure contained characteristic hexose/heptose residues (163/193 m/z), a third included characteristic methyl-deoxy hexose residues (161 m/z), and the fourth was based on a yet-unidentified derivative (232 m/z). When integrating the established oligosaccharide masses into the metaproteomic analysis using a specifically constructed metagenomic database, only the recently reported HexNAc core type oligosaccharide (204 m/z) could be assigned to Ca. Brocadia sapporoensis, thereby exclusively modifying the putative SLP as also described in an earlier study . The other types of oligosaccharide chains could be assigned to different proteins from Ignavibacteria bacterium OLB4 and Ignavibacteria bacterium UTCHB3, respectively, which both are commonly observed community members in anammox enrichment cultures [57, 58] (Fig. 4a–b, SI-EXCEL-T12–14, SI-DOC-Table-S9).
The established oligosaccharide chains for Ca. Kuenenia stuttgartiensis were also confirmed by orthogonal HILIC glycopeptide enrichment procedures combined with an open modification search (using Byonic), as proposed very recently  (SI-DOC-Fig-S15). Moreover, we investigated an isolate of the Ca. Brocadia sapporoensis SLP separately, to confirm that it is modified by only one type of oligosaccharide (204 m/z). To this end, we performed a conventional SDS-PAGE  followed by in-gel proteolytic digestion and mass spectrometric analysis. The shotgun proteomic data were then processed by the developed pipeline. This showed exactly the same HexNAc-type oligosaccharide (204 m/z) as identified from the large-scale data and confirmed the lack of oligo-heptosidic chains found in Ca. Kuenenia stuttgartiensis (SI-DOC-Fig-S26). In summary, both strains—Ca. Kuenenia stuttgartiensis and Ca. Brocadia sapporoensis—share the HexNAc core type oligosaccharides (X-type), but they differ in regard to the presence of terminal nonulosonic acids and the second oligo-heptosidic chains. Nonetheless, the physiological importance of this extensive and complex surface glycosylation discovered for Ca. Kuenenia stuttgartiensis remains to be further investigated experimentally.
Almost all archaea and many bacteria are entirely covered by surface layer proteins (SLPs) . Moreover, SLPs frequently show either acidic or basic isoelectric points due to their propensity for charged amino acids . Li et al.  demonstrated that the negatively charged SLPs of ammonia-oxidizing archaea (AOA) support the acquisition of ammonium (NH4+), which supposedly makes AOA competitive in ecosystems with low ammonium concentrations . Moreover, Li et al. showed that the opposite case of a positively charged surface layer would create a nutrient barrier, preventing NH4+ from passing through the surface pores into the pseudo-periplasmic space [31, 59].
Ca. Kuenenia stuttgartiensis possesses a very comparable (hexagonally arranged) SLP array covering the entire bacterial cell , and the specific SLP has a particularly acidic (predicted) isoelectric point of ~4.25 and a net charge of −60 at physiological pH (Fig. 5a, SI-DOC-Fig-27–31, SI-EXCEL-T15–18). Intriguingly, however, anammox bacteria not only depend on the acquisition of ammonium but also require negatively charged nitrite. Our findings raise the question of why Ca. Kuenenia stuttgartiensis possesses such a highly acidic SLP, and whether Ca. Kuenenia stuttgartiensis (or anammox bacteria in general) evolved additional modulations to avoid interference with substrate acquisition. This is of particular importance as nitrite is commonly the limiting substrate in engineered and natural ecosystems.
Interestingly, Hu et al. recently revealed the ability of anammox to grow on neutral nitric oxide (NO) instead of nitrite. As NO was likely the first oxidized nitrogen form present on early Earth , it is tempting to speculate that the acquisition of the highly acidic SLPs—analog to AOA, which only depend on positively charged ammonium—preceded the ability of anammox bacteria to utilise negatively charged nitrite. This seems further corroborated by the recent finding that anammox can also oxidize ammonium in the presence of an electrode as electron acceptor, thereby again eliminating the nitrite dependency . Nevertheless, this would not explain how Ca. Kuenenia stuttgartiensis maintained the highly acidic surface layer while depending also on negatively charged nitrite. In this respect, the development of the here-discovered dense layer of (charge-balanced) oligosaccharides may have provided sufficient shielding of the highly acidic protein layer (Fig. 5b).
However, the investigated Ca. Kuenenia stuttgartiensis strain produces complex oligosaccharides, which contain also nonulosonic acids. Those sugars are commonly associated with enhancing cellular protection through surface diversification (e.g., in response to bacteriophage recognition) [12, 62, 63]. Yet, nonulosonic acids are highly acidic sugars and support a negative surface charge [12, 64]. Thus, although those sugars may be of advantage for cellular protection, it seems counterintuitive to invest cellular energy into an (even more) negatively charged surface layer.
Nevertheless, carboxylic acids can be chemically modified or balanced through basic counterparts, for example, through esterification of the carboxylic acid groups or by free amines, which are otherwise present in alkylated forms [65, 66]. Surprisingly, the chemical composition of the nonulosonic acids, additional in-source fragmentation and labeling experiments indeed indicate the presence of an unmasked amine (SI-DOC-Fig-S32 and S3). Those basic groups unequivocally have the potential to counterbalance the neighboring, highly acidic carboxylic acids (SI-DOC-Fig-S27) . While zwitterionic sugar modifications have been detected (e.g., recently in glycans of non-vertebrates) , nonulosonic acids with free amines have been only rarely observed but were described, for example, in nerve and cancer cells under certain conditions [64, 66].
To place our findings in the context of the broader anammox physiology, we also investigated the Ca. Brocadia sapporoensis surface layer protein (SLP). This revealed a substantially smaller and less acidic surface layer protein (predicted pI~5.4), which furthermore is modified by only one type of oligosaccharide (Fig. 5b,SI-DOC-Fig-S33–36). It should be mentioned that similar types of oligosaccharides have been already observed between taxonomically more distantly strains [38, 68, 69]. However, the expressed physiological differences may contribute to the reported divergences between strains of Ca. Kuenenia and Ca. Brocadia, for example in regard to substrate affinity (i.e., the ability to thrive at low nutrient concentrations) or the tendency to grow in free-living planktonic form .
Ultimately, the closer investigation of the complex array of oligosaccharides provides new perspectives towards interpreting the appearance of the cell surface layer of anammox bacteria or, in particular, of the unique cell surface layer of Ca. Kuenenia stuttgartiensis. However, the hypothesized roles require further experiments to evidence the proposed shielding of the acidic surface layer protein, or to confirm any of the other (likely multiple) biological roles of the glycan array beyond cellular protection.
We established a universal procedure to explore prokaryotic protein glycosylation from non-pure cultures. The approach provides insights into the chemical identity of novel sugar components and identifies the related protein-linked oligosaccharide chains from large-scale (meta)proteomics data directly. By applying the approach to an anammox bacteria enrichment, we resolve a remarkably complex array of surface layer oligosaccharides. The identified glycans are produced by two apparently independent biosynthetic routes and densely cover a very acidic surface layer protein. Moreover, the investigated anammox strain accomplished charge-balancing of the highly specialized nine-carbon sugars. The molecular mechanisms by which both types of oligosaccharides are transferred to the same surface layer protein, however, remain subject for further studies. Ultimately, the physicochemical interpretation of the discovered spectrum of oligosaccharides suggests a broader link between the development of complex oligosaccharides, charged surface layer proteins and the metabolic lifestyle in anammox bacteria.
Materials and methods
Sample sources microbes
Anammox lab-scale enrichment cultures
Ca. Kuenenia stuttgartiensis was enriched as planktonic cells (~90% relative abundance based on metagenomic data) at 30 °C in a continuous-flow bioreactor equipped with a custom-made microfiltration module (pore size of 0.1 μm) as described elsewhere . The reactor was fed with mineral medium supplemented with 45 mM of ammonium and nitrite . Nitrite concentrations in the effluent were always below detection limit (nitrite test strips MQuant, Merck, Darmstadt, Germany). Anoxic conditions were maintained via continuous sparging with Ar/CO2 (95%/5% v/v) at a rate of 10 ml/min. The reactor hydraulic and solids retention times were ~1.9 and 10.5 days, respectively, and the resulting steady-state OD600 was 1.0–1.1. The pH was controlled at 7.3 with a 1 M KHCO3 solution. The flocculent Ca. Brocadia sapporoensis enrichment was maintained at 30 °C under anoxic conditions using an identical continuous flow bioreactor with a custom-made microfiltration module (pore size of 0.1 μm) fed with a concentrated media of 60 mM ammonium and nitrite as originally described by Lotti et al., 2014 .
Escherichia coli K12 MG1655 was obtained from NCCB, The Netherlands. Haloferax volcanii DSMZ 3757 (cultured at high salinity, >3.5 M NaCl) was obtained from The Leibniz Institute DSMZ, Germany. Campylobacter jejuni 9141 was obtained from Erasmus MC, The Netherlands, and Saccharomyces cerevisiae CEN.PK113-7D was obtained from an in house collection. The control strains were cultured and harvested as described by Kleikamp et al., 2020 .
Mass spectrometry based proteomics
Cell lysis and protein extraction
A modified protocol from Kleikamp et al. was used to prepare whole protein extracts . Briefly, 25 mg biomass (wet weight) were collected in an Eppendorf tube and solubilized in a suspension solution consisting of 200 µL B-PER reagent (78243, Thermo Scientific) and 200 µL TEAB buffer (50 mM TEAB, 1% (w/w) NaDOC, adjusted to pH 8.0) including 0.2 µL protease inhibitor (P8215, Sigma Aldrich). Furthermore, 0.1 g of glass beads (acid, washed, ~100 µm diameter, G4649-10G, Sigma Aldrich) were added and cells were disrupted using 3 cycles of bead beating on a vortex for 30 s followed by cooling on ice for 30 s in-between cycles. In the following, a freeze/thaw step was performed by freezing the suspension at −80 °C for 15 min and thawing under shaking at elevated temperature using an Eppendorf incubator (ThermoMixer). The cell debris was pelleted by centrifugation using a bench top centrifuge at max speed, under cooling for 10 min. The supernatant was transferred to a new Eppendorf tube and kept at 4 °C until further processed. Protein was precipitated by adding 1 volume of TCA (trichloroacetic acid) to 4 volumes of supernatant. The solution was incubated at 4 °C for 10 min and subsequently pelleted at 14.000 rpm for 10 min. The obtained protein precipitate was washed twice using 250 µL ice cold acetone. The protein pellet was dissolved in 100 µL of 200 mM ammonium bicarbonate containing 6 M Urea to a final concentration of ~100 µg/µL. To 100 µL protein solution, 30 µL of a 10 mM DTT solution were added and incubated at 37 °C for 1 h. In the following, 30 µL of a freshly prepared 20 mM IAA solution was added and incubated in the dark for 30 min. The solution was diluted to below 1 M Urea using 200 mM bicarbonate buffer and an aliquot of ~25 µg protein were digested using sequencing grade Trypsin (V511A, Promega) at 37 °C over-night (Trypsin to protein ratio of ~1:50). Finally, protein digests were then further desalted using an Oasis HLB 96 well plate (WAT058951, Waters) according to the manufacturer protocols. The purified peptide eluate was dried using a speed-vac concentrator. The protocol used for ZIC-HILIC extraction of glycopeptides can be found in the supplementary materials.
Whole cell lysate shotgun (meta)proteomics and targeted experiments
The vacuum dried peptide fractions were resuspended in H2O containing 3% acetonitrile and 0.1% formic acid under careful vortexing. An aliquot corresponding to ~250 ng protein digest was each analyzed using an one dimensional shotgun proteomics approach . Briefly, samples were injected to a nano-liquid-chromatography system consisting of an EASY nano LC 1200, equipped with an Acclaim PepMap RSLC RPC18 separation column (50 µm x 150 mm, 2 µm and 100 Å), and an QE plus Orbitrap mass spectrometer (Thermo Scientific, Germany). Unless otherwise specified, the flow rate was maintained at 300 nL/min over a linear gradient using H2O containing 0.1% formic acid as solvent A, and 80% acetonitrile in H2O and 0.1% formic acid as solvent B. Solvent gradients and acquisition modes used for the individual experiments are detailed in the following. (A) High-resolution MS2 mass binning experiments: The peptides were analyzed using a gradient from 4% to 30% solvent B over 32.5 min, and finally to 70% solvent B over 12.5 min. The Orbitrap was operated in data-dependent acquisition (DDA) mode acquiring peptide signals form 500–1500 m/z at 70 K resolution with an AGC target of 3e6. The top 10 signals were isolated with a 1.6 m/z window and fragmented using a NCE of 30. The AGC target was set to 2e5, at a max IT of 100 ms, a fixed first mass of 120, and a resolution of 140 K. Dynamic exclusion was set to 20 s. Mass peaks with unassigned charge state, singly, 7 and >7, were excluded from fragmentation. (B) Whole cell lysate shotgun glycoproteomics: The peptides were analyzed using a gradient from 5% to 30% solvent B over 85 min, and finally to 75% B over 25 min. The Orbitrap was operated in data-dependent acquisition (DDA) mode acquiring peptide signals form 550 to 1500 m/z at 70 K resolution with an AGC target of 3e6. The top 10 signals were isolated with a 2.0 m/z window and fragmented using a NCE of 28. The AGC target was set to 2e5, at a max IT of 75/54 ms, a fixed first mass of 120, an isolation offset of 0.1 m/z, and a resolution of 17 K. Dynamic exclusion was set to 20 s. Mass peaks with unassigned charge state, singly, 6 and >6, were excluded from fragmentation. (C) Whole cell lysate shotgun proteomics: The peptides were analyzed using a gradient from 5 to 30% solvent B over 85 min, and finally to 75% B over 25 min. The Orbitrap was operated in data-dependent acquisition (DDA) mode acquiring peptide signals form 385 to 1250 m/z at 70 K resolution with an AGC target of 3e6. The top 10 signals were isolated with a 2.0 m/z window and fragmented using a NCE of 28. The AGC target was set to 2e5, at a max IT of 75/54 ms, a fixed first mass of 120, an isolation offset of 0.1 m/z, and a resolution of 17 K. Dynamic exclusion was set to 60 s. Mass peaks with unassigned charge state, singly, 6 and >6, were excluded from fragmentation. (D) Analysis of in-gel digested proteins: The peptides were analyzed using a gradient from 5 to 25% solvent B over 25 min, and finally to 60% solvent B over 10 min. The flow rate was maintained at 350 nL/min. The Orbitrap was operated in data-dependent acquisition (DDA) mode acquiring peptide signals form 400 to 1400 m/z at 70 K resolution with an AGC target of 3e6. The top 10 signals were isolated with a 1.6 m/z window and fragmented using a NCE of 28. The AGC target was set to 5e4, at a max IT of 150 ms, a fixed first mass of 120, an isolation offset of 0.1 m/z, and a resolution of 17 K. Dynamic exclusion was set to 60 s. Mass peaks with an unassigned charge state, singly, 6 and >6, were excluded from fragmentation. (E) In-source fragmentation experiments: The peptides were analyzed using a gradient from 6% to 30% solvent B over 40 min, and finally to 60% B over 15 min. The flow rate was maintained at 350 nL/min. The Orbitrap was operated in positive ionization mode acquiring signals alternating between PRM and Full MS-SIM mode. The PRM mode was performed at an in-source CID of 75 eV, isolating the target sugar fragments with an isolation window of 0.4 m/z, at 0.1 m/z isolation offset and a loop count of 9. Fragmentation was performed using a NCE of 25, acquiring fragments at a resolution of 70 K, using an AGC target of 5e5 and a max IT of 150 ms. The fixed lowest mass was set to 50 m/z. The Full MS—SIM mode was operated with an in-source CID of 75 eV, acquiring full scan mass spectra at 70 K resolution, at an AGC target of 3e6 and an max IT of 60 ms, over a mass range of 140–1400 m/z. PRM carbohydrate fragment targets for the Ca. Kuenenia stuttgartiensis enrichment were set to 193, 175, 147, 204, 218, 261, 275, 163, 407; for the Ca. Brocadia sapporoensis enrichment to 147, 204, 218, 232, 334, 133, 352; and for the mammalian protein control sample to 147, 204, 292, 163. SDS-PAGE, glycostaining and in-gel proteolytic digestion are described in the supplementary materials.
PEAKS database search
Whole cell lysate shotgun proteomics raw data (see B/C) were analyzed using PEAKS Studio X (Bioinformatics Solutions Inc., Canada) against databases constructed by metagenomics as described below. Database search was performed employing a two-round search strategy, where the first round was used to construct a focused protein sequence database, thereby allowing peptide spectrum matches up to 5% false discovery rate and protein matches without unique peptide assignments. The database search was performed allowing 50ppm mass error, 0.01 Da fragment error tolerance, considering 2 missed cleavages, oxidation/deamination as variable modifications and carbamidomethylation as fixed modification. The cRAP protein sequences were downloaded from ftp://ftp.thegpm.org/fasta/cRAP. The second round search was performed including the identified oligosaccharides masses as variable modifications, allowing up to 2 variable modifications per peptide. Peptide spectra were filtered against 0.1% false discovery rate and reported protein identifications required ≥1 unique peptides, where protein identifications with ≥2 unique peptides were considered as significant.
Verification of glycopeptide spectrum matches
Matlab R2017b was further used to score glycopeptide spectrum matches for the additional presence of expected oxonium ions or for whether glycan structures have been identified in the same scans by parent ion offset binning. Only proteins/species, which provided spectra showing variable modification search identifications, oxonium ion and structural identifications in the same spectra, were considered as confirmed matches. The BYONIC open modification search procedure is described in the supplementary materials.
Identification of strain specific carbohydrate fragments was established by the following steps: (A) mass binning of very high-resolution shotgun proteomics data; (B) establishing a theoretical chemical composition space for carbohydrate fragments; (C) annotation of (non-peptide) mass peaks with possible carbohydrate compositions; (D) verifying the annotations by investigating the presence of water-loss clusters, and by determining the co-occurrence (correlation) of the cluster peaks across the entire proteomics run. More specifically: (A) Mass spectrometric shotgun raw data acquired at very high resolution (140 K) were converted using peak picking “vendor” into Mascot Generic File (MGF) format considering only second-level scans using the msConvertGUI tool (ProteoWizard). For the comparator E. coli K12 sample an additional absolute int. threshold peak filter of 25 K was applied. The MGF files were imported into the Matlab environment using the Matlab “textscan” function. The mass peaks from all second-level (MS2) scans were further combined into a single matrix, and masses in the range from 110–325 m/z were binned into 0.0001 m/z windows using the “histcounts” function. The obtained raw traces were further corrected for mass drifts by alignment to known amino acid fragment peaks (147.1128; 175.1190; 201.1234; 215.1390; 228.1343; 258.1448; 292.1292 m/z) using the “msalign” function, and further normalised to 100 using the “msnorm” function. The thereby generated raw traces were converted into (centroided) peak lists using the “mspeaks” function, employing a heightfilter of 0.02% (relative to the largest peak). The same procedure (except using a relative heightfilter of 0.1%) was applied simultaneously to the E. coli K12 non-glycosylated comparator strain dataset. (B) An empirical carbohydrate fragment composition space was constructed considering the elemental composition space C5-14H4-28N0-2O2-12S0-1. The compositions were further filtered for realistic structures evaluating C/H and CO/N ratios, the degree of unsaturation (DBEs), the mass defect and the min/max absolute masses. For details see SI-EXCEL-T3–5 and SI-DOC-Fig-S2. (C) To identify possible carbohydrate signals in the sample data, the established sample peak list was matched with the empirical composition space at a mass tolerance of 0.75 ppm. Mass peaks also present in the non-glycosylated E. coli K12 comparator at a relative level >0.5 were considered as amino-acid-related. (D) The established carbohydrate fragment candidates were further evaluated for the presence of water-loss clusters (−18.01 mass deltas between fragments), commonly observed when fragmenting carbohydrate compounds. To ensure that the assigned water-loss clusters/pairs are part of the same parent structure, the clusters were evaluated for co-occurrence within the same scans across the shotgun proteomics dataset. For this, the conventional mass resolution shotgun dataset (of the same sample) was converted to open mzXML data format using the msConvertGUI tool (ProteoWizard), considering first- and second-level scans. The mzXML files was imported into the Matlab environment using the “mzxmlread” function. The obtained “mzxmlstruct” structure was processed using “mzxml2peaks” and “arrayfun” to extract first- and second-level spectral information. By doing so, an extracted ion chromatogram was collected (within ±7.5 ppm mass window) for every carbohydrate candidate, containing information about the occurrence across the scans. To evaluate the degree of co-occurrence (=correlation) between individual carbohydrate fragments, the matrix was further converted into a “correlation matrix”, by dividing the total occurrence by the number of scans shared between fragments. A correlation of 1 indicates that all scans are shared, where a correlation of 0 means that two carbohydrate fragments do not share any scans. To avoid accumulation of background signals, a minimum intensity of 5E4 was required. Only water-loss pairs/clusters with a correlation >0 between, and >0.5 within the entire water-loss cluster were considered for further processing. To avoid accumulation of background signals, the carbohydrate signals required a minimum relative intensity (0.01) or a minimum number of counts (10) across the entire dataset. The established carbohydrate compounds were exported to a Excel table.
Establishing the oligosaccharide profiles as modifying the proteins is performed by the following steps: (A) identifying scans in a shotgun proteomics run, which contain the identified carbohydrate fragments; (B) calculating the “parent ion offsets” numbers and (C) binning the offset numbers to identify the reoccurring mass deltas (=oligosaccharide chains). More specifically: (A) Mass spectrometric raw files of conventional shotgun proteomics analysis runs were converted into mzXML files using the msConvertGUI tool (ProteoWizard). Files were further imported into the Matlab environment using the “mzxmlread” function. Furthermore, a table of the exact masses of the identified carbohydrate fragments as established by the previous sugar-mining step, was provided as Excel table and further imported into the Matlab environment using the “xlsread” function. The constructed “mzxmlstruct” structure was further processed using the “mzxml2peaks” and “arrayfun” functions to extract first- and second-level spectral information. A matrix was created containing mass/charge (m/z) values, ion intensities, scan numbers (and related parameters) of mass peaks matching the identified carbohydrate fragments, within a tolerance of ±7.5 ppm. (B) The parent ion offset was further calculated for every second-level (MS2) spectrum containing a particular carbohydrate fragment. Briefly, for a particular fragment, all scans containing the carbohydrate fragment were collected. Scans with an unique parent ion mass (2 digits) were processed one by one, using the Matlab “for” loop functions. First, the complete second-level (MS2) spectrum was extracted. Peaks with a mass delta to neighboring masses indicating a charge state >1 were deconvoluted to singly charged analogs. The scan was only further processed when the carbohydrate intensity was above the specified intensity threshold. The complete mass peak list was then subtracted from its (singly charged) parent ion mass. The thereby generated (negative) mass numbers (=parent offsets) were collected in a separate matrix. By assuming a minimum peptide mass of 500 Da (or roughly 5 amino acids), offset numbers <(500 – (parent mass)) were excluded. The generation of the parent offset numbers was repeated for every scan containing a particular carbohydrate fragment. The collected parent offset numbers (for a particular carbohydrate fragment) were finally trimmed to the specified mass range (0–2000 Da, except otherwise specified), converted into absolute values and binned into 0.01 Da windows, and visualized using the “histogram” function. Alternatively, the parent ion offset binning of the complete shotgun proteomics dataset was performed using exactly the same approach as described for spectra filtered for the occurrence of certain carbohydrate fragment. Furthermore, the intensity normalized low mass bins for spectra containing a certain oligosaccharide chain were generated by binning the low mass range, after normalizing every mass peaks within a scan to the total peak intensity (100). Peaks with a relative abundance below 0.5 were not further considered. High-resolution mass binning to obtain the accurate mass of the oligosaccharide modification was performed by binning a focused mass range from 350–425, or 1150–1250 m/z, respectively, to achieve a resolution of 0.01 units bin size. Oligosaccharide variable modification masses for PEAKS database search were obtained by annotating the most abundant isotope within an oligosaccharide isotope cluster (after mass binning at a bin size of 0.05 m/z, from 150–2000 Da, and normalization) using the “mspeaks” function. Methods used for metabolomics experiments of released nonulosonic acids and activated sugars can be found in the supplementary materials, SI-DOC-Table-S6 and SI-DOC-Fig-S10.
DNA from the Ca. Brocadia sapporoensis enrichment culture was extracted using the DNeasy UltraClean Microbial Kit (Qiagen, The Netherlands). Following extraction, DNA was checked for quality by gel electrophorese and by using a Qubit 4 Fluorometer (Thermo Fisher Scientific, USA). Metagenomic sequencing was performed by Novogene Ltd. (Hongkong, China). Briefly, for library construction, a total amount of 1 μg DNA per sample was used as input material. Sequencing libraries were generated using the NEBNext Ultra DNA Library Prep Kit (NEB #E7645, USA) following the manufacturer’s recommendations. The DNA sample was fragmented by sonication to a size of 350 bp, then DNA fragments were end-polished, A-tailed, and ligated with a full-length adapter for further PCR amplification. PCR products were purified (AMPure XP system) and libraries were analyzed for their size distribution using an Agilent 2100 Bioanalyzer, and quantified using real-time PCR. The clustering of the index-coded samples was performed on a cBot Cluster Generation System according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced to generate paired-end reads (HiSeq sequencing platform, Illumina Inc., US).
Sampling from the Ca. Kuenenia stuttgartiensis enrichment culture and sequencing is described in Lawson et al., 2020 .
Metagenome assembly and binning
Raw reads were quality checked with FastQC v0.11.7 (www.bioinformatics.babraham.ac.uk/projects/fastqc/), and low-quality reads were trimmed using Trimmomatic v0.39  with the default settings for pair-end reads. Trimmed reads were assembled for using metaSPAdes v3.13.0  with default settings, resulting in 51,275 scaffolds of ≥ 1 kb. Metagenome binning was performed using three different binning algorithms: BusyBee Web , MaxBin 2.0 v2.2.4  and MetaBAT2 v2.12.1 . The three bin sets were supplied to DAS Tool v1.1.0  for consensus binning to obtain the final optimized bins, which resulted in 47 metagenome assembled genomes (MAGs). Genome bins were assessed for completeness and contamination using CheckM v1.0.12 . As a result, 27 high-, 19 medium-, and 1 low-quality MAGs in accordance with minimum information about metagenome-assembled genome (MIMAG) standards  were reconstructed. MAGs were classified taxonomically using GTDB-Tk v1.0.2 and the Genome Taxonomy Database (release 89). The reconstructed genomes were annotated through the NCBI Prokaryotic Genome Annotation Pipeline . Annotation of the protein-coding genes was performed GhostKOALA tool  (accessed October 2019) for Kyoto Encyclopedia of Genes and Genomes (KEGG) enzyme codes and supported with BLASTp (E value <1e-20)  searches against the NCBI non-redundant protein database. Phylogenetic analysis, proteome isoelectric point calculation and proteome sequence homology determination are outlined in the supplementary materials.
The mass spectrometry proteomics raw data have been deposited in the ProteomeXchange consortium database with the dataset identifier PXD021600. Raw sequencing data are available through the NCBI Sequence Read Archive (SRA) under accession number: SRR12344472. The MAGs are available at GenBank under accession numbers JACFMP000000000 to JACFOJ000000000. The BioProject accession number is PRJNA647942.
Prabakaran S, Lippens G, Steen H, Gunawardena J. Post‐translational modification: nature’s escape from genetic imprisonment and the basis for dynamic information encoding. Wiley Interdiscip Rev Syst Biol Med. 2012;4:565–83.
Khoury GA, Baliban RC, Floudas CA. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci Rep. 2011;1:90.
den Ridder M, Daran-Lapujade P, Pabst M. Shot-gun proteomics: why thousands of unidentified signals matter. FEMS Yeast Res. 2020;20:foz088.
Spoel SH. Orchestrating the proteome with post-translational modifications. Oxford University Press UK. 2018;19:4499–4503.
Varki A, Cummings RD, Esko JD, Freeze HH, Stanley P, Bertozzi CR, et al. Essentials of glycobiology. 3rd edition. (Cold Spring Harbor Laboratory Press, New York, 2015–2017).
Varki A. Evolutionary forces shaping the Golgi glycosylation machinery: why cell surface glycans are universal to living cells. Cold Spring Harb Perspect Biol. 2011;3:a005462.
Varki A, Lowe JB. Biological roles of glycans. In: Varki A. Essentials of glycobiology. 2nd edition (Cold Spring Harbor Laboratory Press, New York, 2009). pp 75–88.
Herget S, Toukach PV, Ranzinger R, Hull WE, Knirel YA, Von der Lieth C-W. Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans. BMC Struct Biol. 2008;8:1–20.
Schäffer C, Messner P. Emerging facets of prokaryotic glycosylation. FEMS Microbiol Rev. 2017;41:49–91.
Eichler J, Koomey M. Sweet new roles for protein glycosylation in prokaryotes. Trends Microbiol. 2017;25:662–72.
Eichler J. Extreme sweetness: protein glycosylation in archaea. Nat Rev Microbiol. 2013;11:151.
Kleikamp HB, Lin YM, McMillan DG, Geelhoed JS, Naus-Wiezer SN, Van Baarlen P, et al. Tackling the chemical diversity of microbial nonulosonic acids–a universal large-scale survey approach. Chem Sci. 2020;11:3074–80.
Boleij M, Kleikamp H, Pabst M, Neu TR, Van Loosdrecht MC, Lin Y. Decorating the anammox house: sialic acids and sulfated glycosaminoglycans in the extracellular polymeric substances of anammox granular sludge. Environ Sci Technol. 2020;54:5218–26.
Bucci M. A gut reaction. Nat Chem Biol. 2020;16:363-.
Conrad R. The global methane cycle: recent advances in understanding the microbial processes involved. Environ Microbiol Rep. 2009;1:285–92.
Lam P, Lavik G, Jensen MM, van de Vossenberg J, Schmid M, Woebken D, et al. Revising the nitrogen cycle in the Peruvian oxygen minimum zone. Proc Natl Acad Sci. 2009;106:4752–7.
Strous M, Pelletier E, Mangenot S, Rattei T, Lehner A, Taylor MW, et al. Deciphering the evolution and metabolism of an anammox bacterium from a community genome. Nature. 2006;440:790.
Kartal B, Kuenen JV, Van Loosdrecht M. Sewage treatment with anammox. Science. 2010;328:702–3.
van Niftrik L, Jetten MS. Anaerobic ammonium-oxidizing bacteria: unique microorganisms with exceptional properties. Microbiol Mol Biol Rev. 2012;76:585–96.
Fuerst JA, Sagulenko E. Beyond the bacterium: planctomycetes challenge our concepts of microbial structure and function. Nat Rev Microbiol. 2011;9:403.
Van Teeseling MC, Mesman RJ, Kuru E, Espaillat A, Cava F, Brun YV, et al. Anammox Planctomycetes have a peptidoglycan cell wall. Nat Commun. 2015;6:6878.
Jeske O, Schüler M, Schumann P, Schneider A, Boedeker C, Jogler M, et al. Planctomycetes do possess a peptidoglycan cell wall. Nat Commun. 2015;6:7116.
van Teeseling MC, Maresch D, Rath CB, Figl R, Altmann F, Jetten MS, et al. The S-layer protein of the anammox bacterium Kuenenia stuttgartiensis is heavily O-glycosylated. Front Microbiol. 2016;7:1721.
van Teeseling MC, de Almeida NM, Klingl A, Speth DR, den Camp HJO, Rachel R, et al. A new addition to the cell plan of anammox bacteria:“Candidatus Kuenenia stuttgartiensis” has a protein surface layer as the outermost layer of the cell. J Bacteriol. 2014;196:80–9.
Boleij M, Pabst M, Neu TR, van Loosdrecht MC, Lin Y. Identification of glycoproteins isolated from extracellular polymeric substances of full-scale anammox granular sludge. Environ Sci Technol. 2018;52:13127–35.
Gerbino E, Carasi P, Mobili P, Serradell M, Gómez-Zavaglia A. Role of S-layer proteins in bacteria. World J Microbiol Biotechnol. 2015;31:1877–87.
Sleytr UB, Schuster B, Egelseer E-M, Pum D. S-layers: principles and applications. FEMS Microbiol Rev. 2014;38:823–64.
Schuster B, Sleytr UB. Relevance of glycosylation of S-layer proteins for cell surface properties. Acta biomaterialia. 2015;19:149–57.
Tamir A, Eichler J N-Glycosylation is important for proper Haloferax volcanii S-layer stability and function. Appl Environ Microbiol. 2017;83:e03152-16.
Wang F, Cvirkaite-Krupovic V, Kreutzberger MA, Su Z, de Oliveira GA, Osinski T, et al. An extensively glycosylated archaeal pilus survives extreme conditions. Nat Microbiol. 2019;4:1401–10.
Li P-N, Herrmann J, Tolar BB, Poitevin F, Ramdasi R, Bargar JR, et al. Nutrient transport suggests an evolutionary basis for charged archaeal surface layer proteins. ISME J. 2018;12:2389–402.
Posch G, Pabst M, Brecker L, Altmann F, Messner P, Schäffer C. Characterization and scope of S-layer protein O-glycosylation in Tannerella forsythia. J Biol Chem. 2011;286:38714–24.
Benz I, Schmidt MA. Never say never again: protein glycosylation in pathogenic bacteria. Mol Microbiol. 2002;45:267–76.
Sekot G, Posch G, Messner P, Matejka M, Rausch-Fan X, Andrukhov O, et al. Potential of the Tannerella forsythia S-layer to delay the immune response. J Dent Res. 2011;90:109–14.
Szymanski CM, Burr DH, Guerry P. Campylobacter protein glycosylation affects host cell interactions. Infect Immun. 2002;70:2242.
Drickamer K, Taylor ME. Evolving views of protein glycosylation. Trends Biochem Sci. 1998;23:321–4.
Koomey M. O-linked protein glycosylation in bacteria: snapshots and current perspectives. Curr Opin Struct Biol. 2019;56:198–203.
Wang N, Anonsen JH, Hadjineophytou C, Reinar WB, Børud B, Vik Å, et al. Allelic polymorphisms in a glycosyltransferase gene shape glycan repertoire in the O-linked protein glycosylation system of Neisseria. Glycobiology. 2021;31:477–91.
Stadlmann J, Taubenschmid J, Wenzel D, Gattinger A, Dürnberger G, Dusberger F, et al. Comparative glycoproteomics of stem cells identifies new players in ricin toxicity. Nature. 2017;549:538–42.
Polasky DA, Yu F, Teo GC, Nesvizhskii AI. Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco. Nat Methods. 2020;17:1125–32.
Lu L, Riley NM, Shortreed MR, Bertozzi CR, Smith LM. O-Pair Search with MetaMorpheus for O-glycopeptide characterization. Nat Methods. 2020;17:1133–8.
Fulton KM, Li J, Tomas JM, Smith JC, Twine SM. Characterizing bacterial glycoproteins with LC-MS. Expert Rev Proteom. 2018;15:203–16.
Ahrné E, Müller M, Lisacek F. Unrestricted identification of modified proteins using MS/MS. Proteomics. 2010;10:671–86.
Bern M, Kil YJ, Becker C. Byonic: advanced peptide and protein identification software. Curr Protoc Bioinforma. 2012;40:13.20. 1-13.20. 14
Na S, Bandeira N, Paek E. Fast multi-blind modification search through tandem mass spectrometry. Mol Cell Proteomics. 2012;11:1–13.
Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, et al. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol. 2019;37:469–79.
Izaham ARA, Scott NE. Open database searching enables the identification and comparison of bacterial glycoproteomes without defining glycan compositions prior to searching. Mol Cell Proteom. 2020;19:1561–74.
Ahmad Izaham AR, Ang C-S, Nie S, Bird LE, Williamson NA, Scott NE. What are we missing by using hydrophilic enrichment? improving bacterial glycoproteome coverage using total proteome and FAIMS analyses. J Proteome Res. 2020;20:599–612.
Kelstrup CD, Frese C, Heck AJ, Olsen JV, Nielsen ML. Analytical utility of mass spectral binning in proteomic experiments by SPectral Immonium Ion Detection (SPIID). Mol Cell Proteom. 2014;13:1914–24.
Wuhrer M, Catalina MI, Deelder AM, Hokke CH. Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B 2007;849:115–28.
Hoffmann M, Marx K, Reichl U, Wuhrer M, Rapp E. Site-specific O-glycosylation analysis of human blood plasma proteins. Mol Cell Proteom. 2016;15:624–41.
Singh C, Zampronio CG, Creese AJ, Cooper HJ. Higher energy collision dissociation (HCD) product ion-triggered electron transfer dissociation (ETD) mass spectrometry for the analysis of N-linked glycoproteins. J proteome Res. 2012;11:4517–25.
Hoffmann M, Pioch M, Pralow A, Hennig R, Kottler R, Reichl U, et al. The fine art of destruction: a guide to in‐depth glycoproteomic analyses—exploiting the diagnostic potential of fragment ions. Proteomics 2018;18:1800282.
Kosma P, Wugeditsch T, Christian R, Zayni S, Messner P. Glycan structure of a heptose-containing S-layer glycoprotein of Bacillus thermoaerophilus. Glycobiology. 1995;5:791–6.
Faridmoayer A, Fentabil MA, Haurat MF, Yi W, Woodward R, Wang PG, et al. Extreme substrate promiscuity of the Neisseria oligosaccharyl transferase involved in protein O-glycosylation. J Biol Chem. 2008;283:34596–604.
Harding CM, Nasr MA, Scott NE, Goyette-Desjardins G, Nothaft H, Mayer AE, et al. A platform for glycoengineering a polyvalent pneumococcal bioconjugate vaccine using E. coli as a host. Nat Commun. 2019;10:1–11.
Speth DR, Guerrero-Cruz S, Dutilh BE, Jetten MS. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system. Nat Commun. 2016;7:11172.
Lawson CE, Wu S, Bhattacharjee AS, Hamilton JJ, McMahon KD, Goel R, et al. Metabolic network analysis reveals microbial community interactions in anammox granules. Nat Commun. 2017;8:1–12.
Straka LL, Meinhardt KA, Bollmann A, Stahl DA, Winkler M-K. Affinity informs environmental cooperation between ammonia-oxidizing archaea (AOA) and anaerobic ammonia-oxidizing (Anammox) bacteria. ISME J. 2019;13:1997–2004.
Hu Z, Wessels HJ, van Alen T, Jetten MS, Kartal B. Nitric oxide-dependent anaerobic ammonium oxidation. Nat Commun. 2019;10:1–7.
Shaw DR, Ali M, Katuri KP, Gralnick JA, Reimann J, Mesman R, et al. Extracellular electron transfer-dependent anaerobic oxidation of ammonium by anammox bacteria. Nat Commun. 2020;11:1–12.
Lewis AL, Desa N, Hansen EE, Knirel YA, Gordon JI, Gagneux P, et al. Innovations in host and microbial sialic acid biosynthesis revealed by phylogenomic prediction of nonulosonic acid structure. Proc Natl Acad Sci. 2009;106:13552–7.
Fernández L, Rodríguez A, García P. Phage or foe: an insight into the impact of viral predation on microbial communities. ISME J. 2018;12:1171–9.
Wang J, Cheng B, Li J, Zhang Z, Hong W, Chen X, et al. Chemical remodeling of cell‐surface sialic acids through a palladium‐triggered bioorthogonal elimination reaction. Angew Chem Int Ed. 2015;54:5364–8.
Pabst M, Fischl RM, Brecker L, Morelle W, Fauland A, Köfeler H, et al. Rhamnogalacturonan II structure shows variation in the side chains monosaccharide composition and methylation status within and across different plant species. Plant J. 2013;76:61–72.
Popa I, Pons A, Mariller C, Tai T, Zanetta J-P, Thomas L, et al. Purification and structural characterization of de-N-acetylated form of GD3 ganglioside present in human melanoma tumors. Glycobiology. 2007;17:367–73.
Paschinger K, Wilson IB. Anionic and zwitterionic moieties as widespread glycan modifications in non-vertebrates. Glycoconj J. 2020;37:27–40.
Nothaft H, Scott NE, Vinogradov E, Liu X, Hu R, Beadle B, et al. Diversity in the protein N-glycosylation pathways within the Campylobacter genus. Mol Cell Proteom. 2012;11:1203–19.
Hadjineophytou C, Anonsen JH, Wang N, Ma KC, Viburiene R, Vik Å, et al. Genetic determinants of genus-level glycan diversity in a bacterial protein glycosylation system. PLoS Genet. 2019;15:e1008532.
Oshiki M, Satoh H, Okabe S. Ecology and physiology of anaerobic ammonium oxidizing bacteria. Environ Microbiol. 2016;18:2784–96.
Kartal B, Geerts W, Jetten MS. Cultivation, detection, and ecophysiology of anaerobic ammonium-oxidizing bacteria. Methods in enzymology. 486. Elsevier; 2011. p. 89–108.
Lotti T, Kleerebezem R, Lubello C, Van Loosdrecht M. Physiological and kinetic characterization of a suspended cell anammox culture. Water Res. 2014;60:1–14.
Kleikamp HB, Pronk M, Tugui C, da Silva LG, Abbas B, Lin YM, et al. Database-independent de novo metaproteomics of complex microbial communities. Cell Syst. 2021;12:375–83.
Köcher T, Pichler P, Swart R, Mechtler K. Analysis of protein mixtures from whole-cell extracts by single-run nanoLC-MS/MS using ultralong gradients. Nat Protoc. 2012;7:882.
Lawson CE, Nuijten GH, de Graaf RM, Jacobson TB, Pabst M, Stevenson DM, et al. Autotrophic and mixotrophic metabolism of an anammox bacterium revealed by in vivo 13 C and 2 H metabolic network mapping. ISME J. 2021;15:673–87.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114–20.
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–34.
Laczny CC, Kiefer C, Galata V, Fehlmann T, Backes C, Keller A. BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Res. 2017;45:W171–W9.
Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–7.
Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165
Sieber CM, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018;3:836–43.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–55.
Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35:725–31.
Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Ciufo S, Li W. Prokaryotic genome annotation pipeline. In: The NCBI Handbook. 2nd edition. (National Center for Biotechnology Information, US, 2013). pp 131–45.
Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2016;428:726–31.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
We acknowledge Laura van Niftrik for reading the manuscript and providing constructive feedback. We further would like to acknowledge Claire Chassagne for discussions on surface charges, Guylaine Nuijten and Katinka van de Pas-Schoonen for anammox biomass sampling, technical assistance and reactor care, and Ben Abbas for the support with DNA extraction. The authors acknowledge the SIAM consortium and the TU Delft for startup funding. Additionally, SL was supported by a NWO VIDI grant (016.Vidi.189.050), and ML was supported by a Marie Skłodowska-Curie Individual Fellowship (752992), and a VENI grant from the Dutch Research Council (NWO, VI.Veni.192.252).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pabst, M., Grouzdev, D.S., Lawson, C.E. et al. A general approach to explore prokaryotic protein glycosylation reveals the unique surface layer modulation of an anammox bacterium. ISME J 16, 346–357 (2022). https://doi.org/10.1038/s41396-021-01073-y
This article is cited by
Surface-layer protein is a public-good matrix exopolymer for microbial community organisation in environmental anammox biofilms
The ISME Journal (2023)
Recent trends in glycoproteomics by characterization of intact glycopeptides
Analytical and Bioanalytical Chemistry (2023)