Abstract
We report O-Pair Search, an approach to identify O-glycopeptides and localize O-glycosites. Using paired collision- and electron-based dissociation spectra, O-Pair Search identifies O-glycopeptides via an ion-indexed open modification search and localizes O-glycosites using graph theory and probability-based localization. O-Pair Search reduces search times more than 2,000-fold compared to current O-glycopeptide processing software, while defining O-glycosite localization confidence levels and generating more O-glycopeptide identifications. Beyond the mucin-type O-glycopeptides discussed here, O-Pair Search also accepts user-defined glycan databases, making it compatible with many types of O-glycosylation. O-Pair Search is freely available within the open-source MetaMorpheus platform at https://github.com/smith-chem-wisc/MetaMorpheus.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data used in this manuscript are available through the Proteome-Xchange Consortium via the PRIDE partner repository48 with the dataset identifier PXD017646 (ref. 15) and via MassIVE with identifier MSV000083070 (ref. 9). Processed data using Byonic and Protein Prospector for the urinary O-glycopeptide dataset were downloaded from ref. 8.
Code availability
O-Pair Search is available in MetaMorpheus (v.0.0.307 for HCD–EThcD data and v.0.0.308 for HCD–HCD and HCD–sceHCD data), and is open source and freely available at https://github.com/smith-chem-wisc/MetaMorpheus under a permissive license. All source code was written in Microsoft C# with.NET CORE 3.1 using Visual Studio.
References
Abrahams, J. L. et al. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr. Opin. Struct. Biol. 62, 56–69 (2020).
You, X., Qin, H. & Ye, M. Recent advances in methods for the analysis of protein O-glycosylation at proteome level. J. Sep. Sci. 41, 248–261 (2018).
Suttapitugsakul, S., Sun, F. & Wu, R. Recent advances in glycoproteomic analysis by mass spectrometry. Anal. Chem. 92, 267–291 (2020).
Riley, N. M. & Coon, J. J. The role of electron transfer dissociation in modern proteomics. Anal. Chemi. 90, 40–64 (2018).
Reily, C., Stewart, T. J., Renfrow, M. B. & Novak, J. Glycosylation in health and disease. Nat. Rev. Nephrology 15, 346–366 (2019).
Brockhausen, I. & Stanley, P. in Essentials in Glycobiology (eds Varki, A. et al.) Ch. 10 (Cold Spring Harbour Laboratory Press, 2017).
Darula, Z. & Medzihradszky, K. F. Analysis of mammalian O-glycopeptides—we have made a good start, but there is a long way to go. Mol. Cellular Proteomics 17, 2–17 (2018).
Pap, A., Klement, E., Hunyadi-Gulyas, E., Darula, Z. & Medzihradszky, K. F. Status report on the high-throughput characterization of complex intact O-glycopeptide mixtures. J. Am. Soc. Mass Spectrom. 29, 1210–1220 (2018).
Darula, Z., Pap, Á. & Medzihradszky, K. F. Extended sialylated O-glycan repertoire of human urinary glycoproteins discovered and characterized using electron-transfer/higher-energy collision dissociation. J. Proteome Res. 18, 280–291 (2019).
Pap, A., Tasnadi, E., Medzihradszky, K. F. & Darula, Z. Novel O-linked sialoglycan structures in human urinary glycoproteins. Mol. Omi. 16, 156–164 (2020).
Khoo, K. H. Advances toward mapping the full extent of protein site-specific O-GalNAc glycosylation that better reflects underlying glycomic complexity. Curr. Opin. Struct. Biol. 56, 146–154 (2019).
Mao, J. et al. A new searching strategy for the identification of O-linked glycopeptides. Anal. Chem. 91, 3852–3859 (2019).
Izaham, A. R. A. & Scott, N. E. Open database searching enables the identification and comparison of bacterial glycoproteomes without defining glycan compositions prior to searching. Mol. Cell. Proteomics https://doi.org/10.1074/mcp.TIR120.002100 (2020).
Huang, J. et al. Development of a computational tool for automated interpretation of intact O-glycopeptide tandem mass spectra from single proteins. Anal. Chem. 92, 6777–6784 (2020).
Riley, N. M., Malaker, S. A., Driessen, M. & Bertozzi, C. R. Optimal dissociation methods differ for N- and O-glycopeptides. J. Proteome Res. 19, 3286–3301 (2020).
Solntsev, S. K., Shortreed, M. R., Frey, B. L. & Smith, L. M. Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17, 1844–1851 (2018).
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
Liu, X. et al. Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12, 5830–5838 (2013).
Frank, A. M., Pesavento, J. J., Mizzen, C. A., Kelleher, N. L. & Pevzner, P. A. Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80, 2499–2505 (2008).
Pevzner, P. A., Dančík, V. & Tang, C. L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2001).
Park, J. et al. Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14, 909–914 (2017).
Taus, T. et al. Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 10, 5354–5362 (2011).
Olsen, J. V. et al. Global, In Vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006).
Smith, L. M. et al. A five-level classification system for proteoform identifications. Nat. Methods 16, 939–940 (2019).
Marx, H. et al. A large synthetic peptide and phosphopeptide reference library for mass spectrometry-based proteomics. Nat. Biotechnol. 31, 557–564 (2013).
Halim, A. et al. Assignment of saccharide identities through analysis of oxonium ion fragmentation profiles in LC–MS/MS of glycopeptides. J. Proteome Res. 13, 6024–6032 (2014).
Polasky, D. A., Yu, F., Teo, G. C. & Nesvizhskii, A. I. Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco. Nat. Methods https://doi.org/10.1038/s41592-020-0967-9 (2020).
Bern, M., Kil, Y. J. & Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinformatics 40, 13.20.1–13.20.14 (2012).
Bern, M., Cai, Y. & Goldberg, D. Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal. Chem. 79, 1393–1400 (2007).
Malaker, S. A. et al. The mucin-selective protease StcE enables molecular and functional analysis of human cancer-associated mucins. Proc. Natl Acad. Sci. USA 116, 7278–7287 (2019).
Choo, M. S., Wan, C., Rudd, P. M. & Nguyen-Khuong, T. GlycopeptideGraphMS: improved glycopeptide detection and identification by exploiting graph theoretical patterns in mass and retention time. Anal. Chem. 91, 7236–7244 (2019).
Klein, J. & Zaia, J. Relative retention time estimation improves N-glycopeptide identifications by LC–MS/MS. J. Proteome Res. 19, 2113–2121 (2020).
Khatri, K., Klein, J. A. & Zaia, J. Use of an informed search space maximizes confidence of site-specific assignment of glycoprotein glycosylation. Anal. Bioanal. Chem. 409, 607–618 (2017).
Liu, M. Q. et al. PGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification. Nat. Commun. 8, 438 (2017).
Lee, L. Y. et al. Toward automated N-glycopeptide identification in glycoproteomics. J. Proteome Res. 15, 3904–3915 (2016).
The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).
Chalkley, R. J., Medzihradszky, K. F., Darula, Z., Pap, A. & Baker, P. R. The effectiveness of filtering glycopeptide peak list files for Y ions. Mol. Omi. 16, 147–155 (2020).
Baker, P. R., Trinidad, J. C. & Chalkley, R. J. Modification site localization scoring integrated into a search engine. Proteomics https://doi.org/10.1074/mcp.M111.008078 (2011).
Park, G. W. et al. Classification of mucin-type O-glycopeptides using higher-energy collisional dissociation in mass spectrometry. Anal. Chem. 92, 9772–9781 (2020).
Xu, G., Goonatilleke, E., Wongkham, S. & Lebrilla, C. B. Deep structural analysis and quantitation of O-linked glycans on cell membrane reveal high abundances and distinct glycomic profiles associated with cell type and stages of differentiation. Anal. Chem. 92, 3758–3768 (2020).
Wenger, C. D. & Coon, J. J. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. J. Proteome Res. 12, 1377–1386 (2013).
Khan, A. & Mathelier, A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinformatics 18, 287, https://doi.org/10.1186/s12859-017-1708-7 (2017).
Lang, T. et al. Searching the evolutionary origin of epithelial mucus protein components—mucins and FCGBP. Mol. Biol. Evol. 33, 1921–1936 (2016).
Shin, J. et al. Use of composite protein database including search result sequences for mass spectrometric analysis of cell secretome. PLoS ONE 10, e0121692 (2015).
Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 1260419–1260419 (2015).
Bateman, A. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Park, J. H. et al. Proteomic analysis of host cell protein dynamics in the culture supernatants of antibody-producing CHO cells. Sci. Rep. 7, 44246 (2017).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Acknowledgements
We appreciate discussions with Z. Rolfs, R.J. Millikin and other Smith group members to enhance software analysis speed and address challenges in implementing ideas. This work was supported by National Institute of Health (NIH) grant no. R35 GM126914 awarded to L.M.S. and grant no. R01 CA200423 awarded to C.R.B., as well as with support from the Howard Hughes Medical Institute. N.M.R. was funded through an NIH Predoctoral to Postdoctoral Transition Award (grant no. K00 CA212454-03).
Author information
Authors and Affiliations
Contributions
L.L. and N.M.R. contributed equally to this work. L.L. conceived the project and software design, wrote software, analyzed data and wrote the paper. N.M.R. conceived the project and software design, advised on software development, analyzed most of the data and wrote the paper. M.R.S. designed software and supervised the project. C.R.B. and L.M.S. supervised the project. All authors discussed results and edited the paper.
Corresponding author
Ethics declarations
Competing interests
C.R.B. is a cofounder and Scientific Advisory Board member of Lycia Therapeutics, Palleon Pharmaceuticals, Enable Bioscience, Redwood Biosciences (a subsidiary of Catalent) and InterVenn Biosciences, and a member of the Board of Directors of Eli Lilly & Company.
Additional information
Editor recognition statement Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–17, Notes 1–4 and Note Fig. 1, and Tables 1–3.
Supplementary Data 1
O-Pair Search analysis of mucin standards.
Supplementary Data 2
O-Pair Search analysis of urinary O-glycopeptides.
Supplementary Data 3
Protein database and glycan database.
Rights and permissions
About this article
Cite this article
Lu, L., Riley, N.M., Shortreed, M.R. et al. O-Pair Search with MetaMorpheus for O-glycopeptide characterization. Nat Methods 17, 1133–1138 (2020). https://doi.org/10.1038/s41592-020-00985-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-020-00985-5