Glyco-Decipher enables glycan database-independent peptide matching and in-depth characterization of site-specific N-glycosylation

Glycopeptides with unusual glycans or poor peptide backbone fragmentation in tandem mass spectrometry are unaccounted for in typical site-specific glycoproteomics analysis and thus remain unidentified. Here, we develop a glycoproteomics tool, Glyco-Decipher, to address these issues. Glyco-Decipher conducts glycan database-independent peptide matching and exploits the fragmentation pattern of shared peptide backbones in glycopeptides to improve the spectrum interpretation. We benchmark Glyco-Decipher on several large-scale datasets, demonstrating that it identifies more peptide-spectrum matches than Byonic, MSFragger-Glyco, StrucGP and pGlyco 3.0, with a 33.5%-178.5% increase in the number of identified glycopeptide spectra. The database-independent and unbiased profiling of attached glycans enables the discovery of 164 modified glycans in mouse tissues, including glycans with chemical or biological modifications. By enabling in-depth characterization of site-specific protein glycosylation, Glyco-Decipher is a promising tool for advancing glycoproteomics analysis in biological research.


16.77
The spectrum identification rate of Glyco-Decipher was compared with pGlyco 2.0, the work which firstly introduced the dataset of mouse tissues.   figure. (b) High m/z section (ranging from 900 to 2,000) of the glycopeptide spectra shown in (a) Score distribution of peptide-spectrum matches in spectrum expansion. Left: score S17 distribution of all target-decoy PSMs obtained in spectrum expansion. Right: score distribution of PSMs after score filtering and core structure peak matching. And the PSM score threshold was derived from the e-value filtration method. Source data are provided as a Source Data file.
(b) The GPSM initially identified by in silico deglycosylation (top) and the GPSM that passed the score threshold in the spectrum expansion (bottom).
(c) Normalized peaks of peptide part of intact glycopeptides shown in (b). Peptide fragmentation pattern was retained after spectrum expansion: y4 ion is still the most abundant ion and followed by y6, y7, b2 ions. (a) Score distribution of peptide-spectrum matches in spectrum expansion. Left: score S19 distribution of all target-decoy PSMs obtained in spectrum expansion. Right: score distribution of PSMs after score filtering and core structure peak matching. And the PSM score threshold was derived from the e-value filtration method. Source data are provided as a Source Data file.
(b) The GPSM initially identified by in silico deglycosylation (top) and the GPSM that passed the score threshold in the spectrum expansion (bottom).
(c) Normalized peaks of peptide part of intact glycopeptides shown in (b). Peptide fragmentation pattern was retained after spectrum expansion: (y11+HexNAc) ion is always the most abundant ion and followed by (y15+HexNAc) ion. S20 Supplementary Figure 9 Examples of GPSMs with peptide backbone "GGNVTLPCK" identified from in silico deglycosylation and spectrum expansion.
(a) Score distribution of peptide-spectrum matches in spectrum expansion. Left: score distribution of all target-decoy PSMs obtained in spectrum expansion. Right: score S21 distribution of PSMs after score filtering and core structure peak matching. And the PSM score threshold was derived from the e-value filtration method. Source data are provided as a Source Data file.
(b) The GPSM initially identified by in silico deglycosylation (top) and the GPSM that passed the score threshold in the spectrum expansion (bottom). (b) Percentage of PSMs in which the core structure Y ions were able to be matched based on the results that specifically identified in MSFragger (including different identification of common spectra and different spectra).
CRF indicates the Y1 ion with cross ring fragmentation.
Y1 (peptide + HexNAc) ions are usually of high intensity in the low energy collision induced dissociation (CID) of glycopeptides 1 . It is also reported that the optimal range of collision energy for Y1 ions production is between 15% and 25% 2 or under 35% 3 in S29 stepped-energy higher-energy collisional dissociation (HCD), which is highly overlapped with the energy range used in this dataset 2  The peptide identified by Glyco-Decipher, "VNSTELFHVER", is the C-term sequence part of "FSVRVNSTELFHVER", which was identified by MSFragger. Ten y ions are shared by the two sequences (y1-y10) and were matched in both Glyco-Decipher and MSFragger. Yet glycan ions, including Y0 ion (peptide) and Y1 ion (peptide+HexNAc), were matched as peptide fragment ions incorrectly in MSFragger due to its wider MS1 mass window in open search mode, resulted in the incorrect peptide identification and the lack of corresponding core structure ions in the spectrum.  Glycans with terminal LacDiNAc are significantly produced by HEK293 T cell 6, 7 , which was used to express the SAR-CoV-2 Spike protein in the original study 5  In silico deglycosylation enables sensitive identification of peptide backbones of intact glycopeptides.
A dataset containing 25 raw files acquired by LC-MS/MS (Liquid chromatographytandem mass spectrometry) analysis of intact glycopeptides from five mouse tissues (brain, heart, kidney, liver and lung) digest 2 was used to evaluate the performance of peptide backbone identification of Glyco-Decipher. Among 1,386,844 acquired MS2 spectra, 1,282,263 spectra were determined to be glycopeptide spectra contained more than 2 oxonium ions. These spectra were subjected to penta-saccharide core structure matching, and 556,363 spectra were matched with at least 3 of the core structure peaks in MS2. We first investigated the number of generated spectra when different numbers of matched core structures considered in the generation of in silico deglycosylation spectrum ( Supplementary Fig. 45). It was found that near 2 times of deglycosylated spectra were generated when considering up to top 5 candidate core structures. In terms of peptide backbone identification, majority results (>90%) were obtained from the deglycosylated spectra corresponding to the top-scored core structure with highest Y1 (Y-HexNAc) intensity ( Supplementary Fig. 45b). As other core structures yielded few peptide identifications, only the spectra generated by the top scored core structure with highest Y1 intensity was retained for peptide backbone identification in Glyco-Decipher if not otherwise stated.
MAGIC 9 , which also adopts the in silico deglycosylation strategy, was employed for (a) Numbers of in silico deglycosylated spectra generated when top 1/3/5 matched core structure(s) were considered in spectrum generation.
(b) Rank distribution of matched core structure in identification results from in silico deglycosylated spectra. Effect of Y1 intensity was also considered in spectrum generation: consider >0.9Y1 means only one core structure retained when over 0.9 relative intensity Y1 peak matched in core structure results, the retained value corresponding to the one with highest Y1 peak intensity.  Fig. 49a). In terms of glycosite, 3,884 glycosites were identified. 2,320 of them were also annotated in UniProt and most of them were evidenced by sequence analysis (Supplementary Fig. 49b).
In the identified 215,010 glycopeptide spectra, 75.2% (161,690) of these spectra were To get deep insight into glycan occupancy, we depicted co-occurrence of glycan items at site level ( Supplementary Fig. 52a). After removing low co-occurrence glycans (less S89 than 10 times with any other glycans), 281 glycans that tend to appear with others were retained ( Supplementary Fig. 52b). Oligo-mannose glycans and their ammonium adducted counterpart tend to appear with other glycans on the same site while many complex/hybrid glycans rarely occupy the same site with others. (a) All 943 glycan items were considered in co-occurrence investigation.
(b) 281 high co-occurrence glycans were investigated after removing items which occur less than 10 times with any other glycans.