Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco


Recent advances in methods for enrichment and mass spectrometric analysis of intact glycopeptides have produced large-scale glycoproteomics datasets, but interpreting these data remains challenging. We present MSFragger-Glyco, a glycoproteomics mode of the MSFragger search engine, for fast and sensitive identification of N- and O-linked glycopeptides and open glycan searches. Reanalysis of recent N-glycoproteomics data resulted in annotation of 80% more glycopeptide spectrum matches (glycoPSMs) than previously reported. In published O-glycoproteomics data, our method more than doubled the number of glycoPSMs annotated when searching the same glycans as the original search, and yielded 4- to 6-fold increases when expanding searches to include additional glycan compositions and other modifications. Expanded searches also revealed many sulfated and complex glycans that remained hidden to the original search. With greatly improved spectral annotation, coupled with the speed of index-based scoring, MSFragger-Glyco makes it possible to comprehensively interrogate glycoproteomics data and illuminate the many roles of glycosylation.

Fig. 1: Workflow of MSFragger-Glyco.
Fig. 2: Comparison of mass-offset-type and variable-modification-type searches for N-linked glycopeptides.
Fig. 3: Comparison of MSFragger-Glyco and original analysis for N-glycan datasets.
Fig. 4: Expanded O-glycan searches across tissues.

Data availability

N- and O-linked glycoproteomics raw data were downloaded from the PRIDE Archive53 and Proteome Xchange54 with accession numbers PXD011533 (Riley et al.10 N-glycan data), PXD009476 (Yang et al.12 O-glycan data), and PXD005411, PXD005412, PXD005413, PXD005553 and PXD005555 (Liu et al.8 N-glycan data). Processed search results (raw data, MSFragger output files and processed peak tables) that support the findings of this study are available in PRIDE (accession number PXD021196).

Code availability

The MSFragger-Glyco program was developed in the cross-platform Java language, and incorporated in the MSFragger search engine starting with v.3.0, which can be accessed at


This work was funded in part by NIH grant nos. R01-GM-094231 and U24-CA210967.

D.A.P., F.Y. and G.C.T. developed the algorithm. D.A.P. analyzed the data. A.I.N. conceived and supervised the project. D.A.P. and A.I.N. wrote the manuscript with input from all authors.

