O-Pair Search with MetaMorpheus for O-glycopeptide characterization


We report O-Pair Search, an approach to identify O-glycopeptides and localize O-glycosites. Using paired collision- and electron-based dissociation spectra, O-Pair Search identifies O-glycopeptides via an ion-indexed open modification search and localizes O-glycosites using graph theory and probability-based localization. O-Pair Search reduces search times more than 2,000-fold compared to current O-glycopeptide processing software, while defining O-glycosite localization confidence levels and generating more O-glycopeptide identifications. Beyond the mucin-type O-glycopeptides discussed here, O-Pair Search also accepts user-defined glycan databases, making it compatible with many types of O-glycosylation. O-Pair Search is freely available within the open-source MetaMorpheus platform at https://github.com/smith-chem-wisc/MetaMorpheus.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: O-Pair Search through MetaMorpheus for fast and confident identification of O-glycopeptides.
Fig. 2: Performance of O-Pair Search for O-glycopeptide characterization.

Data availability

The data used in this manuscript are available through the Proteome-Xchange Consortium via the PRIDE partner repository48 with the dataset identifier PXD017646 (ref. 15) and via MassIVE with identifier MSV000083070 (ref. 9). Processed data using Byonic and Protein Prospector for the urinary O-glycopeptide dataset were downloaded from ref. 8.

Code availability

O-Pair Search is available in MetaMorpheus (v.0.0.307 for HCD–EThcD data and v.0.0.308 for HCD–HCD and HCD–sceHCD data), and is open source and freely available at https://github.com/smith-chem-wisc/MetaMorpheus under a permissive license. All source code was written in Microsoft C# with.NET CORE 3.1 using Visual Studio.


  1. 1.

    Abrahams, J. L. et al. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr. Opin. Struct. Biol. 62, 56–69 (2020).

    CAS  Google Scholar 

  2. 2.

    You, X., Qin, H. & Ye, M. Recent advances in methods for the analysis of protein O-glycosylation at proteome level. J. Sep. Sci. 41, 248–261 (2018).

    CAS  Google Scholar 

  3. 3.

    Suttapitugsakul, S., Sun, F. & Wu, R. Recent advances in glycoproteomic analysis by mass spectrometry. Anal. Chem. 92, 267–291 (2020).

    CAS  Google Scholar 

  4. 4.

    Riley, N. M. & Coon, J. J. The role of electron transfer dissociation in modern proteomics. Anal. Chemi. 90, 40–64 (2018).

    CAS  Google Scholar 

  5. 5.

    Reily, C., Stewart, T. J., Renfrow, M. B. & Novak, J. Glycosylation in health and disease. Nat. Rev. Nephrology 15, 346–366 (2019).

    Google Scholar 

  6. 6.

    Brockhausen, I. & Stanley, P. in Essentials in Glycobiology (eds Varki, A. et al.) Ch. 10 (Cold Spring Harbour Laboratory Press, 2017).

  7. 7.

    Darula, Z. & Medzihradszky, K. F. Analysis of mammalian O-glycopeptides—we have made a good start, but there is a long way to go. Mol. Cellular Proteomics 17, 2–17 (2018).

    CAS  Google Scholar 

  8. 8.

    Pap, A., Klement, E., Hunyadi-Gulyas, E., Darula, Z. & Medzihradszky, K. F. Status report on the high-throughput characterization of complex intact O-glycopeptide mixtures. J. Am. Soc. Mass Spectrom. 29, 1210–1220 (2018).

    CAS  Google Scholar 

  9. 9.

    Darula, Z., Pap, Á. & Medzihradszky, K. F. Extended sialylated O-glycan repertoire of human urinary glycoproteins discovered and characterized using electron-transfer/higher-energy collision dissociation. J. Proteome Res. 18, 280–291 (2019).

    CAS  Google Scholar 

  10. 10.

    Pap, A., Tasnadi, E., Medzihradszky, K. F. & Darula, Z. Novel O-linked sialoglycan structures in human urinary glycoproteins. Mol. Omi. 16, 156–164 (2020).

    CAS  Google Scholar 

  11. 11.

    Khoo, K. H. Advances toward mapping the full extent of protein site-specific O-GalNAc glycosylation that better reflects underlying glycomic complexity. Curr. Opin. Struct. Biol. 56, 146–154 (2019).

    CAS  Google Scholar 

  12. 12.

    Mao, J. et al. A new searching strategy for the identification of O-linked glycopeptides. Anal. Chem. 91, 3852–3859 (2019).

    CAS  Google Scholar 

  13. 13.

    Izaham, A. R. A. & Scott, N. E. Open database searching enables the identification and comparison of bacterial glycoproteomes without defining glycan compositions prior to searching. Mol. Cell. Proteomics https://doi.org/10.1074/mcp.TIR120.002100 (2020).

  14. 14.

    Huang, J. et al. Development of a computational tool for automated interpretation of intact O-glycopeptide tandem mass spectra from single proteins. Anal. Chem. 92, 6777–6784 (2020).

    CAS  Google Scholar 

  15. 15.

    Riley, N. M., Malaker, S. A., Driessen, M. & Bertozzi, C. R. Optimal dissociation methods differ for N- and O-glycopeptides. J. Proteome Res. 19, 3286–3301 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Solntsev, S. K., Shortreed, M. R., Frey, B. L. & Smith, L. M. Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17, 1844–1851 (2018).

    CAS  Google Scholar 

  17. 17.

    Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Liu, X. et al. Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12, 5830–5838 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Frank, A. M., Pesavento, J. J., Mizzen, C. A., Kelleher, N. L. & Pevzner, P. A. Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80, 2499–2505 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Pevzner, P. A., Dančík, V. & Tang, C. L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2001).

    Google Scholar 

  21. 21.

    Park, J. et al. Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14, 909–914 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Taus, T. et al. Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 10, 5354–5362 (2011).

    CAS  Google Scholar 

  23. 23.

    Olsen, J. V. et al. Global, In Vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006).

    CAS  Google Scholar 

  24. 24.

    Smith, L. M. et al. A five-level classification system for proteoform identifications. Nat. Methods 16, 939–940 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Marx, H. et al. A large synthetic peptide and phosphopeptide reference library for mass spectrometry-based proteomics. Nat. Biotechnol. 31, 557–564 (2013).

    CAS  Google Scholar 

  26. 26.

    Halim, A. et al. Assignment of saccharide identities through analysis of oxonium ion fragmentation profiles in LC–MS/MS of glycopeptides. J. Proteome Res. 13, 6024–6032 (2014).

    CAS  Google Scholar 

  27. 27.

    Polasky, D. A., Yu, F., Teo, G. C. & Nesvizhskii, A. I. Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco. Nat. Methods https://doi.org/10.1038/s41592-020-0967-9 (2020).

  28. 28.

    Bern, M., Kil, Y. J. & Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinformatics 40, 13.20.1–13.20.14 (2012).

    Google Scholar 

  29. 29.

    Bern, M., Cai, Y. & Goldberg, D. Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal. Chem. 79, 1393–1400 (2007).

    CAS  Google Scholar 

  30. 30.

    Malaker, S. A. et al. The mucin-selective protease StcE enables molecular and functional analysis of human cancer-associated mucins. Proc. Natl Acad. Sci. USA 116, 7278–7287 (2019).

    CAS  Google Scholar 

  31. 31.

    Choo, M. S., Wan, C., Rudd, P. M. & Nguyen-Khuong, T. GlycopeptideGraphMS: improved glycopeptide detection and identification by exploiting graph theoretical patterns in mass and retention time. Anal. Chem. 91, 7236–7244 (2019).

    CAS  Google Scholar 

  32. 32.

    Klein, J. & Zaia, J. Relative retention time estimation improves N-glycopeptide identifications by LC–MS/MS. J. Proteome Res. 19, 2113–2121 (2020).

    CAS  Google Scholar 

  33. 33.

    Khatri, K., Klein, J. A. & Zaia, J. Use of an informed search space maximizes confidence of site-specific assignment of glycoprotein glycosylation. Anal. Bioanal. Chem. 409, 607–618 (2017).

    CAS  Google Scholar 

  34. 34.

    Liu, M. Q. et al. PGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification. Nat. Commun. 8, 438 (2017).

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Lee, L. Y. et al. Toward automated N-glycopeptide identification in glycoproteomics. J. Proteome Res. 15, 3904–3915 (2016).

    CAS  Google Scholar 

  36. 36.

    The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Chalkley, R. J., Medzihradszky, K. F., Darula, Z., Pap, A. & Baker, P. R. The effectiveness of filtering glycopeptide peak list files for Y ions. Mol. Omi. 16, 147–155 (2020).

    CAS  Google Scholar 

  38. 38.

    Baker, P. R., Trinidad, J. C. & Chalkley, R. J. Modification site localization scoring integrated into a search engine. Proteomics https://doi.org/10.1074/mcp.M111.008078 (2011).

  39. 39.

    Park, G. W. et al. Classification of mucin-type O-glycopeptides using higher-energy collisional dissociation in mass spectrometry. Anal. Chem. 92, 9772–9781 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Xu, G., Goonatilleke, E., Wongkham, S. & Lebrilla, C. B. Deep structural analysis and quantitation of O-linked glycans on cell membrane reveal high abundances and distinct glycomic profiles associated with cell type and stages of differentiation. Anal. Chem. 92, 3758–3768 (2020).

    CAS  Google Scholar 

  41. 41.

    Wenger, C. D. & Coon, J. J. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. J. Proteome Res. 12, 1377–1386 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Khan, A. & Mathelier, A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinformatics 18, 287, https://doi.org/10.1186/s12859-017-1708-7 (2017).

  43. 43.

    Lang, T. et al. Searching the evolutionary origin of epithelial mucus protein components—mucins and FCGBP. Mol. Biol. Evol. 33, 1921–1936 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Shin, J. et al. Use of composite protein database including search result sequences for mass spectrometric analysis of cell secretome. PLoS ONE 10, e0121692 (2015).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 1260419–1260419 (2015).

    Google Scholar 

  46. 46.

    Bateman, A. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Google Scholar 

  47. 47.

    Park, J. H. et al. Proteomic analysis of host cell protein dynamics in the culture supernatants of antibody-producing CHO cells. Sci. Rep. 7, 44246 (2017).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    CAS  Google Scholar 

Download references


We appreciate discussions with Z. Rolfs, R.J. Millikin and other Smith group members to enhance software analysis speed and address challenges in implementing ideas. This work was supported by National Institute of Health (NIH) grant no. R35 GM126914 awarded to L.M.S. and grant no. R01 CA200423 awarded to C.R.B., as well as with support from the Howard Hughes Medical Institute. N.M.R. was funded through an NIH Predoctoral to Postdoctoral Transition Award (grant no. K00 CA212454-03).

Author information




L.L. and N.M.R. contributed equally to this work. L.L. conceived the project and software design, wrote software, analyzed data and wrote the paper. N.M.R. conceived the project and software design, advised on software development, analyzed most of the data and wrote the paper. M.R.S. designed software and supervised the project. C.R.B. and L.M.S. supervised the project. All authors discussed results and edited the paper.

Corresponding author

Correspondence to Lloyd M. Smith.

Ethics declarations

Competing interests

C.R.B. is a cofounder and Scientific Advisory Board member of Lycia Therapeutics, Palleon Pharmaceuticals, Enable Bioscience, Redwood Biosciences (a subsidiary of Catalent) and InterVenn Biosciences, and a member of the Board of Directors of Eli Lilly & Company.

Additional information

Editor recognition statement Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–17, Notes 1–4 and Note Fig. 1, and Tables 1–3.

Reporting Summary

Supplementary Data 1

O-Pair Search analysis of mucin standards.

Supplementary Data 2

O-Pair Search analysis of urinary O-glycopeptides.

Supplementary Data 3

Protein database and glycan database.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lu, L., Riley, N.M., Shortreed, M.R. et al. O-Pair Search with MetaMorpheus for O-glycopeptide characterization. Nat Methods 17, 1133–1138 (2020). https://doi.org/10.1038/s41592-020-00985-5

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing