Introduction

Protein glycosylation refers to the covalent attachment of carbohydrates to polypeptides and represents a class of prevalent and structurally diverse co-translational and post-translational modifications (PTMs) that impact a huge number of biological processes1,2,3,4,5,6. Carbohydrate modifications include single monosaccharides and complex carbohydrate chains, both referred to as glycans. Protein glycosylation is a non-templated process and is mediated by enzymes known as glycosyltransferases, responsible for the initiation or elongation of glycans, and oligosaccharyltransferases, responsible for the addition of whole carbohydrate chains. In cells, the complex interplay between glycosyltransferases or oligosaccharyltransferases, carbohydrate transporters and glycosidases — the enzymes that remove these carbohydrates — fine-tunes the glycan structures observed on individual proteins and regulates glycoprotein function, with effects on biological processes that include cellular development7, cell–cell communication8, host–microorganism interactions9,10 and immunity5,11,12. For example, the recruitment of leukocytes to sites of inflammation is precisely controlled by specific glycan structures that mediate interactions with cell-surface lectins to enable selective and site-specific leukocyte homing5,7,11,12. Dysregulation of glycosylation is associated with numerous diseases, including cancer13,14,15,16, infection and inflammation17,18,19,20,21,22, schizophrenia23 and a wide range of congenital and neurological disorders24,25,26. Unravelling the role of glycosylation under both physiological and pathophysiological conditions is a long-standing goal of glycobiology and has driven the rapid development of methods to track glycosylation for diagnostic and therapeutic purposes27,28.

Glycosylation is a universal protein modification across all domains of life with structurally distinct subclasses and glycan types now recognized29,30,31,32,33,34 (Fig. 1a,b). Our knowledge of mammalian asparagine-linked (N-linked) and serine/threonine-linked (O-linked) glycans is the most developed, and these modifications are therefore the focus of this Primer. Characterizing the glycoproteome involves the identification of glycoproteins as well as definition of the macroheterogeneity (structural diversity owing to the presence or absence of glycans at specific glycosylation sites) and microheterogeneity (structural diversity of glycosylation patterns at individual glycosylation sites)35 within these proteins. Microheterogeneity can arise through differences in the number and type of individual monosaccharide residues within the glycan, the structural arrangements and branching patterns of these monosaccharides or the configuration of anomeric linkages (see Box 1 for a guide to the symbol nomenclature for glycans). Ultimately, identifying glycosylation sites and discrete glycan structures is crucial for understanding the roles of glycan-dependent functions in biological processes.

Fig. 1: Protein glycosylation classes and common glycans observed across mammalian systems.
figure 1

a | A range of glycosylation types exist, with most eukaryotic cells possessing multiple pathways for protein glycosylation. Glycosylation involves the installation of glycans on proteins, with N-linked pathways targeting the nitrogen of asparagine residues, O-linked pathways targeting the oxygen atoms of serine/threonine residues and C-linked pathways targeting the second carbon of tryptophan residues. Many of these glycosylation events are observed on proteins known to be secreted or displayed extracellularly, as denoted here, owing to the role of glycosylation in mediating extracellular protein stability and membrane protein recognition. Intracellularly, O-GlcNAcylation has a crucial role in cellular signalling events. b | A range of common glycan classes is observed across mammalian N-linked and mucin-type O-linked glycosylation. N-linked glycans include paucimannose, oligomannose, and complex and hybrid structures. Paucimannose carries one to three mannose (Man) residues on a chitobiose core with variable core fucosylation. Oligomannose glycans contain terminal branches composed only of mannose sugars. Complex and hybrid glycans may contain galactose (Gal), N-acetylglucosamine (GlcNAc), N-acetylgalactosamine (GalNAc), fucose (Fuc), N-acetylneuraminic acid (NeuAc) and N-glycolylneuraminic acid (NeuGc) residues in their antennae, with hybrid glycans also containing unsubstituted terminal mannose residues. Eight core structures have been described for mucin-type O-linked glycosylation, which differ in their composition and linkage position of branches to a protein-linked GalNAc. Non-canonical glycans introduced using metabolic oligosaccharide engineering approaches are also possible; for non-canonical glycans, the presence of monosaccharides bearing chemical handles such as alkyne or azide (N3) groups allow glycan-specific labelling and/or enrichment. GlcA, glucuronic acid; Xyl, xylose.

Glycoproteomics refers to the systems-level study of protein-linked glycans and is a rapidly evolving analytical field that aims to profile glycosylation events observed within biological samples36,37. The characterization of intact glycopeptides is an attractive analytical strategy as only intact glycopeptides can provide direct evidence of the site-specific glycosylation of proteins. Bottom-up glycoproteomics using liquid chromatography–tandem mass spectrometry (LC–MS/MS)-based profiling of intact glycopeptides allows for cell-wide, tissue-wide and organism-wide mapping of glycosylation events and the ability to address their functional roles in biological processes38. This is in contrast to commonly used techniques that involve the study of detached glycans — a field known as glycomics39 — or formerly N-linked glycosylated peptides (N-glycosylation site mapping40).

LC–MS/MS-driven glycoproteomic approaches have been refined considerably over the past decade and these strategies are increasingly being used for quantitative mapping of glycosylation sites within complex mixtures (as previously reviewed36,38,41,42,43,44,45,46,47,48,49,50,51). Technological and computational advances now enable the characterization of thousands of intact N-glycopeptides and O-glycopeptides within a given glycoproteomics experiment52,53,54,55,56,57,58,59,60. Although analytical challenges still exist61,62,63, this Primer aims to illustrate the technologies, tools and approaches available to address pending questions in glycobiology. By presenting developments across the entire glycoproteomics workflow, this Primer is designed to summarize the field as it currently stands. We cover various biological models, chemical glycobiology approaches, glycopeptide enrichment techniques, quantification strategies, glycopeptide separation and ionization, tandem mass spectral analysis, computational tools for glycopeptide identification and options for data storage and dissemination. We hope this Primer serves as a springboard for anyone entering the field of glycoproteomics.

Experimentation

A multitude of experimental pipelines have been developed for glycoproteomic studies that share several key steps. These steps include sample selection, sample preparation, including protein clean-up approaches, the enzymatic digestion of samples to enable access to desired glycopeptides, separation of glycopeptides from non-glycosylated peptides and analysis of glycopeptides using MS strategies. As we discuss these steps below it should be noted that these steps provide a modular framework and, depending on the glycoproteome studied, can be omitted or altered to enhance the identification of the glycopeptides of interest. Although a range of approaches and preparation pipelines exist to study glycoproteomes, we note that the optimal approach is likely to be different for each biological question, and trials of multiple preparation approaches may be needed to achieve the desired outcome.

Choice of sample

State-of-the-art glycoproteomic workflows are capable of handling complex samples derived from cultured cells, tissues, organs and even whole organisms64,65,66,67,68. The choice of sample will affect the degree of sample processing needed (Table 1). For a given sample, the depth of analysis required is dependent on the total number of proteoforms present and the relative abundance and dynamic range of glycoproteins within the sample. For samples of low complexity, glycosylation analysis can be accomplished with low microgram levels of material, although milligram amounts may be needed for complex samples in which the glycoproteins of interest are present in low concentrations. In general, samples of low complexity with a high glycoprotein abundance will allow for better characterization of glycosites and glycoforms, which underpins the rationale for separating or enriching glycoproteins or glycopeptides before analysis (see below)69,70,71,72.

Table 1 Sample considerations

Biological relevance is important to consider if analysing recombinant glycoproteins from different sources. The observed glycosylation sites and glycan structures of proteins heterologously expressed under in vitro conditions, such as in genetically modified immortalized cell lines, may differ from in vivo sources as the repertoire of expressed glycosyltransferases and glycosidases can vary between cell types32. This is evident for viral envelope glycoproteins such as the HIV-1 envelope protein (Env) and SARS-CoV-2 spike glycoprotein, where higher degrees of N-glycan processing are found on native virions than ectopic expression of individual viral proteins in cell lines73. Furthermore, there can be notable differences in glycosite occupancy and glycan structure between native oligomeric proteins and individually expressed subunits, likely influenced by differences in the accessibility of the subunits and the protein quaternary structure to glycosyltransferases69,70,74,75. Thus, care should be taken to ensure that the models used reflect the biological question being explored as closely as possible.

The redundant and overlapping specificities of glycosyltransferases have profound impacts on glycosylation patterns, as compensation and competition for substrates can make the observed relationships between glycosyltransferases and glycosylation events highly context dependent even across similar cell types. This is best illustrated for O-linked, mucin-type glycosylation, which is governed by the expression of several members of a large family of GalNAc-transferase (GalNAc-T) isoforms6. A diverse array of biological specimens have been probed to study the breadth of the O-glycoproteome53,66,67,68,76,77,78. The competition for substrates between GalNAc-T isoforms is complex and largely unclear, and genetically engineered cell lines have been used to dissect substrates of specific GalNAc-T isoforms79,80. Further, isogenic cell lines and transgenic animal models generated using gene editing have identified GalNAc-T isoform-specific substrates in the context of both simplified and natural glycan structures79,81,82,83. These findings highlight the benefits of genetic approaches for understanding glycosylation site specificity in situations in which complex interplays exist. Considering this known complexity associated with glycosylation substrates for many glycosylation systems, it is advisable to include several biological replicates representing different clonal lineages of genetically engineered cell lines and only consider consistent changes relevant83,84.

Sample preparation

Protein isolation and buffer considerations

Optimal protein isolation is key for efficient downstream sample processing in all proteomic experiments. Protein extraction from tissues can require pre-treatment with enzymes or ethylenediaminetetraacetic acid (EDTA) to release cells from the extracellular matrix before cell lysis. Once isolated, cells can be lysed with cryogenic homogenization, mechanical disruption using sonication or mechanical grinding in buffers that contain strong detergents such as sodium dodecyl sulfate (SDS) or chaotropic agents85,86,87,88. Complex tissue-derived and cell-derived samples will rarely be solubilized completely and often require clearing of the lysates by centrifugation to remove insoluble material. Homogenization may also be necessary for viscous biological secretions such as sputum or intestinal mucus89,90. It should be noted that several commonly used cationic, anionic or zwitterionic detergents can interfere with proteolytic digestion and may cause LC–MS analyte signal suppression without subsequent clean-up (see below)91,92. MS-compatible detergents such as RapiGest76,93,94, N-dodecyl β-d-maltoside95 or ProteaseMAX96 have been used for glycoproteomic studies to solubilize membrane proteins and can be combined with orthogonal isolation methods such as mechanical disruption to enhance protein isolation77,97. Notably, these MS-compatible detergents can be less effective solubilization agents than strong detergents such as SDS98. The isolation of membrane-bound glycoproteins requires vigorous disruption of the cell membrane followed by a solubilization step that uses detergents or chaotropic agents to prevent the precipitation of hydrophobic proteins99; for soluble secreted glycoproteins, the most important consideration when preparing the sample is to avoid contamination from exogenous protein sources commonly used to maintain cell lines, such as fetal bovine serum, which can be achieved by briefly culturing cells in serum-free medium100.

For many glycoproteomic studies, it may be essential to ensure complete linearization of glycoproteins during solubilization by removing disulfide linkages with the aid of reduction agents such as dithiothreitol (DTT) or tris(2-carboxyethyl)phosphine (TCEP). Ensuring protein linearization can improve the ability of detergents to coat hydrophobic regions within glycoproteins; however, this process also results in the generation of reduced cysteine residues, which are extremely reactive and readily undergo oxidation as well as other chemical transformations. Alkylation of reduced cysteines can ‘cap’ these reactive amino acids, preventing the formation of undesirable cysteine products and the re-formation of disulfide linkages during sample preparation. Iodoacetamide is commonly used to alkylate cysteine residues during glycoproteomic sample preparation. Although alkylation is advantageous for improving the detection of cysteine-containing peptides, it has been noted that the underalkylation or the unintended alkylation of residues such as methionine (overalkylation) can cause the misassignment of glycan compositions, as these events unexpectedly change the glycopeptide mass to match isobaric alternative glycan compositions, leading to incorrect glycopeptide assignment61. Both glycoproteomic61 and proteomic101 studies have highlighted that underalkylation and overalkylation are commonplace, and care should be taken to ensure that alkylation reagent concentrations and incubation times are optimized for the given sample.

Glycoproteome clean-up approaches

To facilitate the analysis of chemically solubilized samples, recent advancements in sample preparation offer attractive solutions to removing interfering chemical agents such as salt and detergents before subsequent MS analysis. Three such approaches are filter-aided sample preparation (FASP)102, suspension traps (S-traps)103,104 and methods based on protein aggregation capture (PAC)105,106,107,108,109 (Fig. 2). These methods involve binding proteins to solid-phase supports such as filters (FASP), quartz mesh (S-traps) or magnetic particles (PAC) and washing with chaotropic agents or organic solvents to remove contaminants; digestion of the bound proteins then releases peptides for subsequent analysis. FASP-based sample preparation is well established and has been implemented in numerous N-glycoproteomic studies across species and tissues64,110, whereas S-traps and PAC-based approaches such as single-pot, solid-phase-enhanced sample preparation (SP3)111 are a more recent addition to the glycoproteomics toolkit (although they have been implemented in several glycoproteomic studies)112,113,114. These approaches can be used for sample amounts as low as a few micrograms to several milligrams of protein, and they result in high peptide recovery rates102,103,104,111. It was recently demonstrated that PAC enables the removal of chemical or affinity tag agents typically used in click-based labelling105,106, making PAC particularly appealing for bioorthogonal glycoproteomic sample preparation.

Fig. 2: Sample preparation.
figure 2

Glycoproteomic sample preparation can be summarized into six key steps. a | Proteins for glycoproteomic analysis are extracted and solubilized from samples of interest such as from cell culture models using a cell disruptor to lyse the cells. b | Protein mixtures are processed to remove potential interfering reagents for downstream processing with filter-aided sample preparation (FASP), quartz mesh (S-trap) and protein aggregation capture (PAC)-based approaches commonly used. c | The resulting protein preparations are then digested with proteases and/or glycoproteases to generate mixtures that contain the glycopeptides of interest for downstream analysis. Digestion of FASP, S-trap or PAC prepared samples allows the release of peptides from the captured proteins enabling their collection for downstream liquid chromatography–mass spectrometry (LC–MS) analysis. At this stage, glycosidases can also be used to remove specific glycans of interest or modify glycans to enhance their downstream detection by reducing microheterogeneity. d | The resulting peptide mixtures containing the glycopeptides of interest can be concentrated and purified, allowing the removal of non-digested proteins, enzymes or buffer components that may interfere with chemical labelling or enrichment approaches. Several solid-phase clean-up media can be used to achieve this, including C18, hydrophilic–lipophilic balance (HLB) or styrenedivinylbenzene–reverse phase sulfonate (SDB–RPS) resins, which can be implemented in solid-phase extraction (SPE) cartridge, plate or microcolumn (Zip/STAGE tips) formats. e | Further peptide-based chemical derivatization can be undertaken to enable enrichment, quantification or to enhance the detection of glycopeptides during downstream LC–MS analysis. For example, the incorporation of positively charged imidazolium groups within biotin-based enrichment handles can be used to improve electron-driven dissociation (ExD)-based fragmentation. f | Glycopeptides of interest can be enriched using affinity approaches before LC–MS analysis, such as streptavidin enrichment of biotin-labelled metabolic ogligosaccharide engineering (MOE) samples, lectin weak affinity chromatography (LWAC), which exploits the binding of lectins to specific sugars, or hydrophilic interaction liquid chromatography (HILIC), which retains glycopeptides based on hydrophilic interactions.

Proteome digestion approaches

After clean-up, glycoproteins can be digested using proteases to produce individual peptides and glycopeptides (Fig. 2). The conversion of proteins into (glyco)peptides offers a range of analytical advantages in both downstream separation and mass spectral analysis. Reducing the chemical heterogeneity of a proteome to a mixture of soluble peptides enables separation with much higher resolution than intact proteins. Furthermore, smaller peptides fragment more efficiently and produce simpler spectra, aiding the characterization of modification sites. The workhorse protease for glycoproteomics is trypsin, which cleaves at the C terminus of arginine or lysine residues with high specificity, efficiency and robustness. This generates peptides that can be protonated at the amine-containing N terminus and the arginine/lysine residue at the C terminus, resulting in rich MS/MS spectra when analysed in positive polarity mode. Although trypsin is the protease of choice for most N-glycoproteomic and O-glycoproteomic analyses, O-glycosites are commonly found in dense clusters notoriously resistant to tryptic cleavage owing to a lack of arginine/lysine residues96, which limits the applicability of trypsin to these densely O-glycosylated domains. To address this issue, many groups have employed digestion with several alternative proteases that possess different cleavage specificities to increase proteome coverage, such as chymotrypsin to cleave C-terminally to phenylalanine, tryptophan and tyrosine; GluC, which cleaves C-terminally to glutamic acid and to a lesser extent aspartic acid, or AspN, which cleaves N-terminally to aspartic acid and to some extent glutamic acid72,115,116,117.

Non-specific proteases such as Pronase and Proteinase K have also been used to analyse a range of glycosylated proteins. Pronase is a commercially available mixture of proteases isolated from Streptomyces griseus that exhibits both exoprotease and endoprotease activities and yields a crude mixture of heterogeneous peptide fragments118. Pronase is useful for the glycoproteomic analysis of samples of modest complexity119; however, the peptide heterogeneity generated by Pronase digestion is a major issue for quantitative site-specific glycan profiling. Similar to Pronase, Proteinase K is an endoprotease that cleaves at the C termini of aliphatic and aromatic residues and is often used in conjunction with trypsin digestion for glycosylation site localization of simple mixtures120. The drawback of both non-specific digestion techniques is that the resultant data must be searched against all theoretical peptides, producing an extremely large search space that increases search time and false discovery rates (FDRs; discussed below)121. Further, the propensity of these proteases to generate relatively short glycopeptides limits their usefulness for complex samples, as mapping the identified glycopeptides to specific proteins can be difficult. Thus, the use of non-specific proteases is typically restricted to single-protein mixtures, where this approach is most appropriately used to characterize regions such as mucin domains that cannot be accessed by other enzymes122. It should also be noted that despite these challenges, the high levels of peptide heterogeneity observed with these enzymes can be advantageous for applications such as the localization of glycosylation events to specific amino acids119,120,122.

Glycoproteome-centric proteases (O-glycoproteases)

Glycoproteases are increasingly being used in O-linked glycoproteomic studies123. O-glycoproteases have modest peptide sequence specificities, cleaving the peptide backbone based on the presence of various O-linked glycans and allowing the digestion of glycosylated regions resistant to other proteases. OgpA, derived from Akkermansia muciniphila and marketed and sold as OpeRATOR, was the first commercial O-glycoprotease. This enzyme cleaves at the N terminus of serine or threonine residues that bear truncated glycans such as GalNAc or GalNAc-Gal, also known as core 1 O-glycans (Fig. 1b). OgpA has been used for the digestion of isolated O-glycoproteins, cell lysates and tissues56,124. Its main drawback is that it is unable to cleave glycopeptides decorated with sialic-acid-containing O-glycans; thus, samples must be sialidase-treated before proteolytic digestion. Additionally, OgpA can be inefficient in regions that are densely glycosylated, requiring downstream electron-based fragmentation for confident O-glycosite localization63.

Several glycoproteases other than OgpA have been introduced to the field. Secreted protease of C1 esterase inhibitor (StcE), derived from enterohaemorrhagic Escherichia coli, is specific for a serine/threonine*-X-serine/threonine motif, cleaving before the second serine/threonine (the asterisk indicates that the first serine/threonine is invariably glycosylated). StcE improved the analysis of densely O-glycosylated mucin-domain glycoproteins, increasing protein sequence coverage, the number of glycosites identified and the number of localized glycans in proteins studied96. Expanding on this concept exploiting the diversity of bacterial glycoproteases as glycoproteomic tools, the Bertozzi group compiled a glycoprotease toolkit of six additional enzymes: Bacteroides thetaiotaomicron 4244 (BT4244), A. muciniphila 0627 (AM0627), 1514 (AM1514) and 0608 (AM0608), enteroaggregative E. coli protease involved in colonization (Pic), and Streptococcus pneumoniae zinc metalloprotease C (ZmpC), where each has a different cleavage motif125. Similarly, other groups have demonstrated that enzymes such as the coagulation-targeting metalloendopeptidase (CpaA) of Acinetobacter baumannii126 and the immunomodulating metalloprotease (IMPa) from Pseudomonas aeruginosa also cleave glycosylated serine and threonine residues with unique specificities127.

Endoglycosidases and exoglycosidases

Endoglycosidases release oligosaccharides from the protein attachment site or within the glycan chain, whereas exoglycosidases trim monosaccharides from the non-reducing termini of the glycan chain128. The removal of glycans or the reduction of glycan heterogeneity can concentrate the observable signal of glycosylated or previously glycosylated peptides to a limited number of chemical species, which can enhance the detection of glycosylation events. One of the most commonly used endoglycosidases is PNGase F, which cleaves intact N-glycans from proteins and deamidates the previously modified asparagine residue to aspartic acid. Similar enzymes such as Endo F and Endo H cleave within the chitobiose N-glycan core to leave a single GlcNAc on the modified asparagine residues129,130. A universal endo-O-glycosidase has not been characterized, although some glycosidases can remove truncated O-glycan structures, for example, OglyZOR, a commercially available endoglycosidase derived from Streptococcus oralis that hydrolyses truncated core 1 O-glycans. Commercial glycosidases derived from S. pneumoniae and Enterococcus faecalis that release core 1 and (to a limited extent) core 3 O-glycans are also available. Many O-glycosidases have limited activity if the glycans are modified by sialic acid or GlcNAc and thus must be used in conjunction with other glycosidases to remove these modifications44.

Exoglycosidase treatment is commonly used to simplify glycoproteomic analyses. Sialidases are often used to remove sialic acids, reduce microheterogeneity and limit the number of detected glycoforms, which can improve the identification of glycopeptides131. Broad-acting sialidases such as neuraminidase A can remove sialic acid residues α2,3, α2,6 or α2,8 linked to a glycan, whereas some sialidases are specific for a particular linkage; for example, Clostridium perfringens neuraminidase is commonly used to cleave α2,3 linkages78. Other exoglycosidases used in O-glycoproteomics include β1,4-galactosidase from S. pneumoniae, which removes β1,4-linked galactose, and β-N-acetylhexosaminidase — also from S. pneumoniae — which removes terminal non-reducing HexNAc residues from oligosaccharides49. Owing to the innate specificity of these enzymes, exoglycosidases are useful for trimming glycans for targeted characterization of glycan epitopes and simplifying glycoproteomic analysis. However, removing monosaccharides does limit the information that can be gleaned using intact glycoproteomics.

Chemical and biological affinity-based glycopeptide enrichment

In-depth glycoproteomic analysis benefits from selective enrichment of glycopeptides with affinity-based approaches broadly used across the field and are classified as being chemical or biological in nature. Within this section we introduce common protocols for N-glycopeptide and O-glycopeptide enrichment yet highlight that for a detailed discussion of the breadth of glycopeptide enrichment approaches used across the community readers are referred to exhaustive literature on this topic36,41,43,129,132.

Some of the first proteome-scale studies of glycosylation events used chemical enrichment strategies such as the covalent tethering of glycoproteins or glycopeptides to hydrazide-based resins through cis-diols within the carbohydrate chains. These approaches allow the formation of covalent linkages between resins and the glycopeptides or glycoproteins of interest and allow the removal of non-glycosylated peptides or proteins with detergents or chaotropic agents followed by the elution of the enriched glycopeptides by enzymatic or chemical cleavage of the linked glycans133,134,135,136,137,138,139,140,141. The need to release N-glycans of glycopeptides using PNGase F or the acid hydrolysis of hydrazide-linked sialic acids in these methods has led to the development of alternative chemical enrichment approaches that do not require the removal or alteration of glycan structures. For example, several boronic acid-based resins have been developed that allow glycopeptide enrichment using reversible covalent tethering of glycopeptides142. Additionally, many approaches have been developed that exploit charge-based interactions, including the capture of glycopeptides carrying terminal acidic sugars (such as NeuAc) using titanium dioxide143,144,145 and electrostatic repulsion–hydrophilic interaction chromatography (ERLIC)146. Not all glycans are charged, and several approaches that exploit the hydrophilic nature of glycans have also been developed for various classes of glycopeptides, such as hydrophilic interaction liquid chromatography (HILIC)147,148,149,150 (Fig. 2). Chemical enrichment approaches can typically be undertaken without the need for genetic or metabolic manipulation of models with commercial reagents, and these approaches are therefore applicable to a wider range of biological systems.

In contrast to chemical approaches, naturally occurring proteins that recognize carbohydrate epitopes can also be used for glycopeptide enrichment. A widely used class of carbohydrate-recognizing proteins are lectins, which can be used in lectin weak affinity chromatography (LWAC; Fig. 2) set-ups to enable the enrichment of different subtypes of glycopeptide using a diverse array of commercially available lectins — such as wheat germ agglutinin (WGA) and jacalin lectins, which recognize O-GlcNAc and core 1 O-glycans, respectively80,116,151,152,153,154. LWAC approaches involve the use of lectins immobilized to solid supports, such as agarose, which enable the retention of glycopeptides and the removal of non-glycosylated peptides by washing with mild non-denaturing buffers155. WGA-based LWAC is a common O-GlcNAc enrichment technique, although recent work suggests that commercial anti-O-GlcNAc antibody mixtures are more selective and specific for O-GlcNAcylated peptides114,156. An alternative for core 1 O-GalNAc glycoproteomics is peanut agglutinin (PNA) lectin53,66,68. Vicia villosa agglutinin (VVA) is also well suited for the enrichment of glycopeptides that bear a single O-GalNAc (Tn, Fig. 1b); this lectin was implemented into the SimpleCell O-glycoproteomics approach, where cultured cells are genetically engineered to express homogeneous O-GalNAc glycosylation76,77. Both LWAC and antibody-based enrichment allow glycopeptides to be isolated and eluted with competitive free-carbohydrate solutions155 or through denaturation of the affinity protein with acid114. In addition to its use in studying N-linked and O-linked glycosylation, LWAC-based enrichment has also been applied to study O-Man glycosylation. LWAC-based enrichment of O-Man glycopeptides has been achieved using concanavalin A (ConA) lectin, which recognizes O-linked, but not C-linked, α-mannose sugars94,157,158. It is important to note that the broad and poorly defined specificities of most lectins can complicate interpretation of glycopeptide enrichment results and care must be taken when interpreting glycans enriched with a given lectin.

Metabolic engineering of oligosaccharides for glycopeptide enrichment

Metabolic oligosaccharide engineering (MOE; Fig. 2) has emerged as an important strategy to profile N-glycans and O-glycans58,93,159,160. In MOE, monosaccharides are chemically modified with tags and incorporated into proteins with endogenous glycosylation machinery. The tags are stable in the cellular environment, but reactive against bioorthogonal click chemistry strategies, such as copper-mediated azide-alkyne cycloaddition161. The addition of ‘clicked’ functionalized biotin allows tagged glycopeptides to be enriched using streptavidin-conjugated beads before MS analysis129,162. Metabolic incorporation of clickable alkyne- or azide-modified sugars has been demonstrated for mapping N-glycosites93 and O-GalNAc163,164,165 or O-GlcNAc proteomes166,167. One benefit of MOE is that the functionalized glycans can be incorporated into glycan structures without a chain-terminating effect, allowing additional sugars to be added by endogenous glycosyltransferases. However, labelling efficiency in MOE is extremely low, and reagents are of limited specificity as they can be interconverted and incorporated into unintended glycan structures. A bump-and-hole strategy can be used to label cellular glycans with engineered GalNAc-Ts that accept bumped GalNAc donors168,169,170, delineating GalNAc-T specificities. This strategy has been further developed using a metabolic labelling probe (GalNAzMe) for specific labelling of O-glycans171, as well as clickable tags (ITag) that stably increase glycopeptide charge172.

Analysis of glycopeptides

Glycopeptides are typically characterized using LC–MS/MS, whereby glycopeptides eluted from an LC column are ionized by electrospray ionization (ESI) and sequenced using a suite of tandem MS (MS/MS) dissociation methods41,48,49. Parameters for LC and MS/MS stages are key decision points in glycoproteomic experiments and ultimately have consequences for data quality and interpretation. Matrix-assisted laser desorption/ionization (MALDI)–MS is also a popular high-throughput approach for glycopeptide analysis, although the ability to automate ESI and directly couple it to separation technologies allows a greater dynamic range for complex samples and has made ESI-based LC–MS/MS the mainstay of most glycoproteomic methods. ESI-based LC–MS/MS strategies are therefore the focus of this section.

Liquid chromatography-based separation of glycopeptides

Most glycoproteomic methods use low-pH (pH <2) reverse phase liquid chromatography (RP-LC) to separate glycopeptides before MS/MS, with a C18-based stationary phase and flow rates that range from tens to hundreds of nanolitres per minute (nanoflow). RP-LC is a versatile and robust method widely used in proteomics as it offers a combination of high peak capacity and simplicity173. The retention and thus separation of glycopeptides in the RP-LC column is mostly driven by the hydrophobicity of the peptide backbone, although the size, conformation and monosaccharide content of glycans also contribute to retention behaviour174,175,176. Retention times are useful for glycopeptide identification in combination with the accurate precursor mass and tandem MS spectra, especially when ambiguous MS/MS spectra generate several potential glycopeptide candidates. Prediction tools can help incorporate this orthogonal information from RP-LC177,178,179, although adoption of these data into informatic tools is not yet ubiquitous.

There is no universal separation technique that is ideal for all classes of glycoconjugates129, and although RP-LC is the dominant separation modality in LC–MS/MS glycoproteomics, it does have some drawbacks, such as the co-elution of isomeric glycoforms owing to their identical peptide sequences180,181,182. Although the use of elevated column temperatures in RP-LC can allow the separation of isomeric N-glycopeptides and O-glycopeptides183, this does not always provide adequate separation of all isomeric species. Alternatively, HILIC-LC, in which separation is largely influenced by the hydrophilicity imparted by glycan moieties, can be used in online glycopeptide separations and is effective at separating isomeric species that differ only in glycan linkage position and branching184,185,186. Several HILIC-LC resins exist187 and new HILIC resins provide novel separation characteristics that may be beneficial for specific glycopeptide classes181. Another RP-LC alternative uses porous graphitized carbon (PGC) as the stationary phase, which retains polar compounds with MS-compatible solvents188 and is highly advantageous for separating released glycans189. Its use for separating glycopeptides is somewhat complicated as both hydrophobicity and charge contribute to retention using this separation modality190,191,192; furthermore, highly sialylated glycopeptides and glycopeptides derived from commonly used proteases such as trypsin, GluC or chymotrypsin are difficult to elute from the resin, meaning non-specific proteases that generate shorter glycopeptides are typically required193,194,195,196,197. PGC-LC has been shown to separate isomeric N-glycopeptides and O-glycopeptides198, and separation of glycopeptides with α2,3-linked or α2,6-linked sialic acids can be modulated by column temperature199. However, challenges with the elution of large glycopeptides owing to the retention of hydrophobic species have limited the widespread use of PGC-LC in LC–MS/MS glycoproteomics. We compare separation techniques in Table 2. It is worth noting that although the above-mentioned LC-based approaches are traditionally performed using columns, they can also be successfully employed using chip-based fluidic devices180.

Table 2 Online separation options for glycopeptide analysis

Non-liquid chromatography-based separation of glycopeptides

Separation techniques other than LC are increasingly finding applications in the fine structural analysis of glycans and glycopeptides38. Online capillary electrophoresis (CE) is an emerging tool for glycoproteomics that can separate glycopeptide isomers and offer potential improvements in reproducibility and sensitivity200,201,202,203. Electrophoretic mobility in CE is governed by glycopeptide charge-to-size ratios, and, as a result, glycan composition (and especially sialic acid content) can affect migration, providing glycan-based separation of glycoforms of the same peptide backbone204,205,206. Gas-phase separations of glycopeptides following LC or CE can also be used to separate isomeric glycopeptides; these techniques include ion mobility spectrometry (IMS) approaches207,208,209,210 such as travelling-wave IMS211,212,213,214,215, differential/high-field asymmetrical waveform IMS216,217,218,219 and drift-tube IMS220,221,222,223. In addition to allowing isomeric separation, IMS has also been shown to enable separation of glycosylated species from non-modified peptides, providing access to glycopeptides incompatible with chromatographic enrichment224,225.

The benefits of individual separation approaches (which are summarized in Table 2) can be leveraged together. Offline separation is typically used to fractionate complex mixtures of glycopeptides — usually enriched before fractionation — into multiple samples, with each sample then analysed by LC–MS/MS using an orthogonal separation modality. This fractionation approach can markedly increase sensitivity by reducing the complexity of the mixture being analysed in each online LC–MS/MS analysis; conversely, this dramatically decreases throughput as the analysis of a single sample is spread across multiple LC–MS/MS acquisitions. One such prominent ‘2D’ glycoproteomic approach is offline high-pH RP-LC followed by online low-pH RP-LC57,142,226,227,228,229,230, although offline fractionation with HILIC-LC, PGC-LC and CE have been used prior to online low-pH RP-LC44,119,231,232. Other combinations of glycopeptide separation techniques can provide unique advantages of separating on both glycan and peptide components182, such as offline RP-LC coupled with online CE203, offline HILIC-LC coupled with offline PGC-LC followed by MALDI–MS233 and offline RP-LC coupled with online HILIC-LC60. Two-dimensional separations can also be achieved fully online through carrying out two orthogonal separations on an LC system coupled to the mass spectrometer (for example, online RP-PGC-MS/MS)121,122,234,235,236. As these methods often require specialized equipment, they are not as widely used as offline fractionation followed by online orthogonal separation with LC–MS/MS.

Tandem MS fragmentation of glycopeptides

Several acquisition approaches are available on modern MS instruments237,238 and the choice of fragmentation method — also referred to as the dissociation method — needed to generate MS/MS spectra is determined by the key information required for glycopeptide identification239. Each fragmentation strategy generates specific fragment ion types that determine what information can be obtained for glycopeptide characterization35,240,241,242 and also dictates the instrument platforms suitable for a given experiment, appropriate data acquisition strategies and the informatic tools available for post-acquisition analysis.

The most ubiquitous fragmentation strategy is collision-induced dissociation, which can be accomplished using beam-type collision-induced dissociation (beamCID) — referred to as higher-energy collisional dissociation (HCD) on some instrument platforms243 — or resonance activation collision-induced dissociation (resonanceCID), which is commonly undertaken using ion traps. BeamCID and resonanceCID have notable differences in the resulting spectra of glycopeptides as a result of their different mechanisms and timescales of collisional energy deposition244. ResonanceCID spectra are typically dominated by fragments resulting from glycosidic cleavages, denoted as B/Y-type ions as per the nomenclature published by Domon and Costello242, whereas beamCID provides access to both glycosidic and amide peptide bond fragmentation events245 with amide peptide bond fragments given as a-, b- and y-type ions according to the nomenclature published by Biemann240. Further, ions with low mass to charge ratio (m/z) are typically lost during resonanceCID, whereas these ions are detectable using beamCID244.

BeamCID has become the preferred collision-induced dissociation approach for glycoproteomics owing to its ability to access both glycan and peptide fragments and high-m/z and low-m/z ions. Additionally, beamCID spectra enable rapid MS/MS acquisition rates, with modern mass spectrometers capable of acquiring more than 20 scans per second. BeamCID collision energies can be adjusted by modulating direct current offsets applied to collision cell devices within mass spectrometers, making collision energy a user-adjustable parameter when designing methods. Lower relative collision energies favour glycan fragments (typically B/Y-type ions and some cross-ring fragments), and higher relative collision energies favour peptide fragments (typically b-type and y-type ions with and without glycan loss)246,247,248,249,250,251. Oxonium ions — relatively low-mass ions derived from monosaccharide and disaccharide fragmentation — are also a dominant feature of beamCID spectra. For N-glycopeptides more so than for O-glycopeptides, beamCID can generate b/y-type ions that retain the initiating HexNAc moiety, which can aid glycosite localization. The generation of b/y-type ions that retain intact glycan species is rare in beamCID regardless of collision energy, although the presence of these ions is more likely for glycopeptides with low proton mobility252,253; lack of b/y-type ions with intact glycan species complicates spectral interpretation and glycosylation localization where multiple potential glycosites are present in a given glycopeptide, a challenge most often encountered with O-glycopeptides36,42,63,254. An emerging trend is the use of stepped-collision-energy beamCID (SCE-beamCID), in which a single MS/MS spectrum is collected for product ions generated using multiple collision energies for the same glycopeptide precursor54,55,247,248,255,256. SCE-beamCID methods often provide multiple types of informative fragment that can aid identification and structural analyses, although this does not ameliorate the weaknesses of beamCID for O-glycosite localization252.

Alternative methods to collision-induced dissociation include those that use electrons or photons as the means of fragmentation257. Electron-driven dissociation (ExD) methods such as electron capture dissociation (ECD) and electron transfer dissociation (ETD) generate c/z-type ions for peptide backbone sequencing (as defined by the Biemann peptide fragmentation nomenclature)240, with little to no fragmentation of glycan moieties. These methods are therefore complementary to beamCID and particularly useful for site-specific characterization of O-glycopeptides and other glycopeptides with multiple potential sites of modification43,77,258,259,260,261. ExD is also valuable for highly charged species, although the generation of sequence-informative fragment ions decreases at low precursor cation charge densities262. This can be problematic for glycopeptide analysis, in which neutral or negatively charged glycans add mass without a concomitant addition of positive charge. Additionally, glycan size and attachment site can affect ExD dissociation owing to secondary gas-phase structure effects263. Hybrid fragmentation methods that combine ExD with collisions (for example, electron transfer/higher-energy collision dissociation, or EThcD) or photons (activated-ion ETD) can address these issues57,241,264,265. Beyond improving fragment ion generation from ExD itself, these hybrid methods also generate fragment ion types from each dissociation mode — for example, in the EThcD regime, c/z-type peptide fragment ions are generated from ETD, and b/y-type peptide fragment ions and B/Y-type glycan fragment ions are generated from beamCID59,241,264,266. Photon-based dissociation methods, particularly ultraviolet photodissociation (UVPD), have also shown promise for generating information-rich spectra with multiple fragment ion types for glycopeptides267,268,269,270, but have yet to be explored for large-scale glycoproteomics.

Although ExD and related hybrid methods can generate high-quality spectra for both N-glycopeptides and O-glycopeptides, these methods often have reaction times of tens to hundreds of milliseconds per spectrum262. BeamCID, by comparison, provides near instantaneous fragmentation. BeamCID or SCE-beamCID methods are therefore more suited for large-scale N-glycopeptide analyses, where b/y-type ions — some of which retain an initiating HexNAc — and B/Y-type ions are mostly sufficient for identification271. Conversely, ExD-centric methods are favourable for O-glycopeptide characterization despite high time costs, as c/z-type ions that retain intact glycan modifications are often necessary for O-glycosite localization59,63,252,258,259,266,272. Experiments that require ExD often combine beamCID and ExD in a product-dependent fashion273,274,275. In product-dependent acquisition schemes, more expedient beamCID methods are used to sequentially fragment precursor ions to look for potential glycopeptides. Once a specific product ion is observed, for example, abundant oxonium ions from a given precursor, the instrument then triggers an ExD spectrum for that same ion, creating complementary pairs of beamCID and ExD spectra for the same precursor ions and relegating ExD spectral acquisition to only those ions that are likely to be glycopeptides.

Glycopeptide data acquisition approaches

Glycoproteomic methods rely heavily on data-dependent acquisition (DDA)38: here, the first mass spectrometer (MS1) scan measures intact glycopeptide ions across a wide m/z range (for example, m/z 400–1,800) as they elute from the LC column and are ionized by ESI. Ions are then isolated using ~1–3 atomic mass unit (amu) windows, fragmented using one of the dissociation strategies discussed above, and the subsequent fragment ions are measured in an MS/MS spectrum with the underlying assumption that fragment ions are largely derived from a single precursor ion. DDA typically prioritizes ions by abundance and sequentially selects analytes for MS/MS analysis, starting with the most abundant and/or desired charge states.

As an alternative to DDA, data-independent acquisition (DIA) isolates large overlapping windows of ions that are designed to cover a user-defined mass range276,277. Each window of ions may contain multiple peptide and glycopeptide species that co-isolate and are thus co-fragmented, and as a result MS/MS spectra contain fragments from multiple precursor ions50. DIA methods iterate over the same windows in a repeating fashion with a defined duty cycle regardless of the signal in MS1 scans, which can aid in sampling of low-abundance ions and improve reproducibility across multiple acquisitions. The complex MS/MS spectra resulting from DIA are challenging to interpret, especially for inherently complex analytes like glycopeptides277. A particular challenge that remains unresolved is the fact that related glycopeptide forms tend to generate near-indistinguishable fragment patterns, making it difficult to identify which precursor structures fragments arise from if captured in the same window. Several DIA methods for glycoproteomics have emerged in recent years278,279,280,281,282,283,284,285,286, and the momentum of DIA in traditional proteomics will likely propel a growth in DIA for glycoproteomics in the future if the above challenge can be overcome50. DIA could be especially beneficial for structure-focused glycoproteomics, as partially resolved, co-eluting glycoforms can be distinguished based on unique chromatogram profiles of fragment ions, enabling quantification of isobaric glycoforms38.

In DDA, the ability to combine several dissociation methods or acquisition styles (for example, product-dependent methods) allows the use of dynamic acquisition schemes that can leverage the strengths of multiple dissociation approaches252. Conversely, DIA requires rapid MS/MS acquisition to enable iterative sampling of all m/z windows across the mass range, which limits the range of dissociation methods that can be implemented efficiently and the ability to dynamically switch between dissociation methods. This limits DIA largely to beamCID-based strategies as ExD spectra simply require too much time to acquire, meaning most glycoproteomic methods that employ DIA to date have focused on simple mixtures of N-glycopeptides278,279,280,281,282,283,284,285. Although O-glycoproteomic studies using DIA have been described, they currently rely on additional DDA-based ExD methods for O-glycosite localization286. Instrumentation that reduces acquisition times for ExD spectra could have the potential to enable ExD-based DIA methods for large-scale glycoproteomics287,288.

Quantification approaches and multiplexing

Several strategies exist for the relative quantification of glycosylation across different samples including those targeted at live cells, proteins or peptides. These methods vary in their multiplexing capacity, quantification accuracy and time and cost effectiveness.

The most common type of quantification is label-free quantification (LFQ). Here, signal intensity or spectral counts are considered to determine relative abundance and each LC–MS analysis corresponds to a single sample, resulting in no sample multiplexing. LFQ analysis has been used to study a range of glycoproteomes including O-GalNAc286 and N-linked glycosylation events289. Although extremely accessible and cost effective, LFQ methods can be less accurate than other methods290.

Stable isotope labelling by amino acids in culture (SILAC) is a highly accurate yet costly method to identify and quantify relative differential changes in complex protein samples291. In this technique, cells are grown in the presence of ‘heavy’ 13C-labelled or 15N-labelled amino acid isotopologues to allow their incorporation into proteins, which leads to an observed mass shift in the MS1 spectrum of labelled peptides. By mixing labelled and unlabelled samples, the relative abundance of peptides or glycopeptides can be determined by comparing the ratio of the light and heavy forms at the MS1 level52,291,292. SILAC typically enables the multiplexing of up to three samples and has been used for N-glycoproteomic studies to understand insulin resistance within adipocytes52, track N-glycan processing and monitor temporal and stress-induced changes in O-GlcNAcylation events156,293. Other stable isotope-based labelling strategies for quantification at the MS1 level include dimethyl294,295 or diethyl296 labelling of peptides, which offers an inexpensive alternative for large-scale experiments and multiplexing of up to three samples296,297. These approaches have been applied for differential glycoproteomic analyses of O-GalNAc and O-Man glycoproteomes, allowing the study of the substrate specificities of GalNAc-Ts79,81 and the mannosyltransferases POMT1 and POMT2 (ref.157) and TMTC1–TMTC4 (ref.158).

A further strategy to enhance multiplexing is the use of isobaric labels that contain different stable isotopes298,299,300 such as isobaric tags for relative and absolute quantification (iTRAQ)301 and tandem mass tags (TMT)298. Upon fragmentation, reporter ions of various masses are generated and their intensities are used for quantification at the MS/MS or MS/MS/MS (MS3) level302,303 with multiplexed analyses of up to 18 samples possible304. An additional advantage of isobaric labelling for glycoproteomics is a notable increase in the observed charge states of glycopeptides, which enhances electron-driven fragmentation305. Despite the advantages, the high price of isobaric labels and the ability to label only submilligram quantities of samples using standard commercial kits306 is a potential drawback. TMT-based labelling has been applied to studying O-GalNAc84,307, O-GlcNAc308,309 and N-glycoproteomes310,311.

For sensitive applications in the clinical setting, absolute quantification of select glycopeptides is possible using internal standards such as stable isotope-labelled counterparts, which allow normalization across samples and direct comparison of analyte concentrations between different patients312,313. This approach enables reliable quantification of glycopeptides of interest in large patient cohorts, although it is limited by the time-consuming and high-cost synthesis of relevant glycopeptide standards.

Results

Comprehensive characterization of glycopeptides from MS data involves determining the peptide sequence, the site (or sites) of glycosylation and identity of the attached glycans. A growing number of software solutions enable the identification of glycosylation events (Table 3), and computational approaches associated with glycopeptide identification are rapidly developing. Below, we highlight the features of different fragmentation data and discuss the existing tools and emerging bioinformatic methods. We also highlight the conceptual frameworks that underpin glycopeptide assignments, localizing glycosylation sites and defining glycans.

Table 3 Software tools for glycopeptide annotation of MS data

Glycopeptide sequence determination

Decades of developments in proteomics have provided various robust methods for identifying peptide sequences from MS data by comparing protein sequences from a reference database in silico with the observed spectra314,315. Such methods include Mascot316, SEQUEST317, Andromeda318 and MS Amanda319. Handling the addition of attached glycans of varying complexity poses great challenges with existing proteomic workflows; below, we discuss two major approaches that address these challenges, distinguished by whether peptide fragment ions are searched with or without attached glycans.

Searching peptide ions with the attached glycan: ‘variable modification’ searches

When treating attached glycans as variable modifications on peptides (Fig. 3a), possible glycan masses are specified on allowed sites, and theoretical glycopeptides containing these glycan masses are generated from the peptide sequences provided in a proteome database. The precursor mass for a given MS/MS spectrum is used to select candidate glycopeptides, which are then scored by comparing the observed MS/MS spectrum with the theoretical fragment ions of the glycopeptide candidates. Sequences supported by sufficient peptide fragment ion evidence result in a peptide spectral match (PSM). Glycopeptides present two major challenges for this approach: first, the heterogeneity of possible glycan structures can result in a huge number of candidate glycopeptides to consider when multiple possible glycosylation sites are available in a peptide sequence. Second, glycan fragments are often lost from glycopeptide ions in collisional or hybrid activation methods; as glycan modifications are specified as an integral part of the peptide in this approach, they are expected to be present in both MS1 and MS/MS spectra, and the loss of a glycan or parts thereof in the MS/MS spectrum will prevent matching theoretical ions containing the glycan (Fig. 3). For this reason, traditional proteomics tools have severely limited sensitivity for the sequencing of glycopeptides using collision-activation-based fragmentation.

Fig. 3: Glycopeptide sequence identification methods.
figure 3

a | Glycans can be searched as a variable modification of peptides, similar to how other post-translational modifications (PTMs) are identified in common proteomics searches. The in silico prediction of the search tool assumes that the fragment ions observed in the tandem mass spectrometry (MS/MS) events will preserve the glycan at the site of attachment in the peptide. b | For glycopeptides fragmented by collisional activation, offset-style searches can look for peptide ions that have lost the glycan directly within MS/MS scans. c | The glycan-first method of separating the precursor mass into peptide and glycan components uses a series of Y-type ions resulting from a known core structure to determine the glycan mass. Subtracting the glycan mass from the precursor mass yields the peptide mass, which is then used to determine candidate peptide sequences that are compared with the peptide fragment ions observed. d | The alternative peptide-first method uses an offset-style search to identify the peptide sequence from peptide fragment ions that have lost the glycans. The resulting peptide mass is subtracted from the precursor mass to yield the glycan mass, which can be matched to a specific composition or structure using the observed Y-type ions. m/z, mass to charge ratio.

Glycoproteomics-focused sequencing approaches can address the above challenges. One approach is to adapt an existing search engine to filter spectra for the presence of oxonium ions and add glycan masses to observed peptide ions112,229,320,321. A variation of this method179,322 first groups glycopeptide spectra using clustering methods before searching, allowing glycopeptide annotations to be transferred from one identified spectrum to the entire cluster. Other tools, including Byonic323,324,325,326,327, perform their own variable modification-style search with the inclusion of peptide fragment ions with various glycan additions or losses, using various scoring methods to evaluate glycopeptides (note that although this method is extremely sensitive, concerns have been raised about the accuracy of this approach328). Alternatively, tools such as Protein Prospector329 use a multi-step search, whereby an initial open search determines common glycan masses to be included in a second, more specific search330,331. Overall, variable modification searches are straightforward to implement for the localization of glycans — particularly those on glycopeptides fragmented by electron-based activation methods — although the inclusion of additional fragment types can reduce search speed, and some methods have reduced sensitivity in collision-activation data owing to glycan losses.

Searching peptide ions missing fragmented glycans: ‘offset’ searches

In offset searches, peptide sequence ions are searched directly without glycans (Fig. 3b). This offers greatly improved sensitivity over variable modification approaches for glycopeptides fragmented by collisional activation, as peptide fragments that have lost glycans (Fig. 3b) can be matched and contribute to the peptide score. The most common implementation of this method is a ‘glycan-first’ search, in which a series of Y-type ions corresponding to a common glycan core structure is used to determine the mass of the glycan and, by extension, the glycan-free peptide mass, which is then used to search for peptide fragment ions without the glycan (Fig. 3c). This approach has proved popular60,332,333,334 owing to its computational efficiency and ability to infer glycan composition information from the Y-type ions, particularly for N-glycopeptides.

In an alternative ‘peptide-first’ strategy, peptide fragment ions without glycans can be searched directly in the MS/MS spectra using an open or mass-offset search (Fig. 3d). These searches335,336 use computational advances to allow peptide fragment ions to be matched in MS/MS spectra even if the peptide sequence mass does not match the observed molecular mass. This approach eliminates the need to match a Y-type ion series, providing a sensitivity boost for glycopeptides that carry labile glycans or do not produce prominent Y-type ions.

Finally, spectral library methods, such as those used for DIA-based analysis, circumvent the need for glycan-first or peptide-first searching by matching observed MS/MS signals to annotated glycopeptide fragmentation spectra286,337,338,339. This technique gives sensitive quantification at the cost of requiring a separate analysis to build the spectral library and limiting identifications to glycopeptides present in the library.

Glycosylation site localization

Methods for locating the site or sites of glycan attachments in a peptide are varied depending on the type of glycosylation being considered. For example, most tryptic N-glycopeptides have only a single possible glycosylation site corresponding to the consensus sequon asparagine-X-serine/threonine (where X can be any amino acid except proline). The predictable nature of N-linked glycosylation allows the inference of glycan location, often without the need for additional spectral evidence. In peptides with multiple sequons or combinations of glycosylation types, N-glycans can be localized directly using ExD or hybrid-type activation and searching for intact glycans with variable modification-style methods340,341 or peptide fragment ions retaining a glycan remnant using collisional activation252,335.

Experimental localization of O-glycosylation sites on peptides represents an important yet challenging task owing to the lack of a universal deglycosylation enzyme effective for all O-glycan core structures. This prevents the application of ‘de-glycoproteomic’ approaches common to N-glycan site localization, in which N-glycans are removed by PNGase F, allowing glycan sites to be determined by identifying deamidated residues within an N-glycosylation sequon. Localization of O-glycans is complicated by the lack of a consensus sequon to reduce the number of possible glycosylation sites in a peptide, their facile dissociation from the peptide carrier upon collisional activation and the high density of occupied O-glycosylation sites on peptides from mucin and mucin-domain glycoproteins. Therefore, O-glycosite localization requires the analysis of intact glycopeptides using electron-based or hybrid-type activation methods, which produce peptide fragment ions that preserve glycan conjugation252. In favourable cases, such as highly charged glycopeptides, variable modification-style searches can provide high-confidence O-site localization from electron-based activation329. Peptides with multiple possible glycosylation can have a huge number of potential glycan configurations, and, as a result, most variable modification-style searches are restricted to only the most commonly occurring glycans. To address this combinatorial limitation, open or mass-offset search methods first identify the peptide sequence and total glycan mass, reducing the search space to allow the localization of individual glycans. Protein Prospector performs such a multi-step search for electron-driven fragmentation329. The O-Pair search introduced in MetaMorpheus336 and a similar method implemented in pGlyco3 (ref.342) use paired collisional and electron-based ion activation scans, performing a mass-offset search of the collisional scan to identify the glycopeptide sequence and total glycan mass, followed by dynamic programming to decompose the total glycan mass into multiple individual glycans and localize each within the peptide (Table 3). This highly promising approach takes advantage of the sensitivity of offset searches and collisional activation to identify glycopeptides and the ability of electron-based activation to localize glycosites.

Glycan identification

Paired glycomic and proteomic analyses of PNGase F-treated samples can provide detailed characterization of glycans and deglycosylated glycosites310,343. Glycomics provides useful (and still unmatched) structural insight into the protein-linked glycans in a protein mixture; however, undertaking parallel proteomics and glycomics workflows is time-consuming and reduces overall sensitivity. This has prompted the development of methods that can characterize some glycan structural features directly from intact glycopeptides. The determination of the monosaccharide composition of glycans is complicated by the multiple isomeric and isobaric compositions and structures possible for an observed molecular mass62,343, an analytical challenge exacerbated by the existence of common peptide modifications such as oxidation, deamidation and carbamidomethylation that mimic the mass difference between different glycan compositions61. Compositions can in most cases be discriminated using glycan fragment ions, similarly to how a peptide is sequenced using peptide fragment ions. Collisional fragmentation energies that generate glycan fragment ions are often lower than those optimal for peptide backbone fragmentation, creating a trade-off between optimizing glycan and peptide fragmentation in the LC–MS/MS experiment. SCE-beamCID and paired low-energy and high-energy beamCID experiments have shown great promise in this area55,344.

Many published studies of intact glycopeptides report only the mass of a glycan, or a putative composition or structure, assuming that there is only a single composition or structure for the detected glycan mass. This approach, used by many tools229,327,332,336,338,345 (software tools are listed in Table 3) greatly simplifies data handling; however, it does not consider isomeric glycans, which may have biological implications. As the existence of multiple isomeric glycans is often not known in advance, this can potentially result in incorrect assignments when a single form is assumed.

Several methods to assign glycan compositions and/or structures directly from glycopeptide fragmentation data have been developed recently. Glycan-first offset searches are a natural fit for these approaches given their reliance on the identification of Y-type ions, with several programs implementing glycan assignments with various scoring methods60,333,334,337,346,347. The peptide-first glyco search in MSFragger can perform a combined Y-type ion and oxonium-ion composition assignment method as a post-processing step348. Compared with the glycan-first approach, in which all glycans are scored against each spectrum, the peptide-first approach greatly simplifies glycan assignment as the glycan mass is known before assignment, making it easier to distinguish between glycans with similar or identical masses. Finally, variable modification searches have been demonstrated using Y-type ions to distinguish glycan compositions using a database of possible glycans or de novo from a range of possible glycans320,323,324,325,326,331.

Stereochemical and positional glycan information can be extracted from glycopeptide MS/MS information38,349 (Table 4). For example, ratios of specific oxonium ions can be used to discriminate between glycopeptides bearing isomeric O-GlcNAc and O-GalNAc moieties350 and are also useful for crude classification of N-glycopeptides versus O-glycopeptides252,351. Ratios of oxonium and B-type ions can also distinguish between α2,3-sialyl and α2,6-sialyl linkages352, between core (α1,6-) and antenna (α1,2/3/4-) fucosylation248,255,344, and between some classes of mucin-type O-glycosylation266,272,353,354. Y-type ions generated at low beamCID energies can also be used for determination of core fucosylation255,355,356, bisecting GlcNAc-containing glycopeptides248,357, and various antennary structures272,337,353,358,359. Furthermore, oxonium ions specific to chemical groups introduced by glycan labelling can be useful for structural characterization171,172,360,361,362,363. Although many of these diagnostic ions occur through beamCID fragmentation, they can also be observed in hybrid ExD spectra such as those collected by EThcD. Despite the promise that diagnostic ions provide for structural characterization, a major challenge is the co-elution of glycoforms (see below). Without chromatographic, electrophoretic or mobility separations of related glycoforms, diagnostic ions characteristic of multiple structures may be present in a single MS/MS spectrum.

Table 4 Diagnostic glycan fragment ions

Statistical control of assignments

Controlling the FDR for peptide sequence assignment has received considerable attention. FDR methods for peptide sequence assignment involve generating decoy peptides by reversing or scrambling target peptide amino acid sequences and using the ratio of decoy-to-target peptide matches to estimate the score threshold required to achieve a given FDR364. Within glycoproteomic studies, FDR methods based solely on peptide sequence determination have been suggested to provide partial correct FDR control, although multiple groups have highlighted higher-than-anticipated FDRs in glycopeptide data sets62,273. Attempts to overcome inadequate FDR controls include additional score cut-offs to limit potentially erroneous assignments57,62,252 and manual inspection of glycopeptides76,77. Further, computational approaches have been proposed to control glycopeptide FDRs at both the glycan and peptide levels54,348.

In contrast to the statistical controls for the peptide sequences assigned to glycopeptides, which are generally considered robust, the determination of glycan composition or structure is acknowledged to be a key limitation of intact glycopeptide analysis365. The software tools for the determination of glycan composition described above use a fragment-ion-based method for assigning glycans, and the accuracy of such assignments has largely been evaluated manually or with empirically determined score filters62. Manual expert-based curation of output data is time-consuming and often prohibitive for large-scale analysis of glycopeptides, prompting the development of glycan-specific FDR methods to enable automated control of false assignments. The linear sequence of amino acid residues can be reversed or shuffled to make a decoy peptide with the same amino acid composition as the target; however, non-linear glycans comprising multiple different building blocks of identical masses require a different method for decoy generation. GlycoPepEvaluator366 and IQ-GPA323 generate decoys by substituting monosaccharides and reversing or altering the glycopeptide sequence to obtain a decoy glycopeptide that is an isobar of the target glycopeptide and that contains a nonsensical glycan (Table 3). An alternative ‘spectrum-based’ FDR method implemented in GlycoPAT324 and pGlyco346 generates decoy glycans by applying random mass shifts to the fragment ions of a target glycan, preserving the fragmentation characteristics of the target glycan and assessing the likelihood of random matches to ions in the mass spectrum. This approach has been adopted by GPSeeker60, GlycReSoft177,325, MSFragger glyco335 and StrucGP347 (Table 3). Care must be exercised using these techniques as the provided FDRs may not hold when faced with unexpected glycans not present in the provided database, or oxonium ions resulting from co-fragmentation of co-eluting isobaric glycopeptides.

Once identified, statistical assessments are also applied to identify quantitative changes in glycopeptides such as Student’s t tests, which are commonly used for comparisons between binary conditions52. Multiple sample comparison approaches such as ANOVA are also widely implemented if multiple groups are to be compared310,311. For these comparisons at least a onefold change in abundance is typically required to be considered a change and the P value threshold should be tailored to the experiment using multiple hypothesis corrections to ensure further confidence in the observed changes52,310,311. Threshold-based approaches are typically favoured for studies investigating the substrates of specific glycotransferases; where glycopeptides with 10-fold79,81 or 100-fold158 changes in the absence of the glycotransferase in question are considered as potential substrates. Changes observed at the glycosylation level can be driven by both changes in glycosylation occupancy and changes in the total protein level, and normalization against proteomics data can therefore be advantageous79,81,158,293.

Applications

Glycoproteomics has a range of applications in the clinical sciences. The study of complex biological samples from clinically relevant specimens such as tissue biopsy samples, blood, urine and cerebrospinal fluid (CSF) has provided an opportunity to understand the fundamental roles of glycosylation in pathophysiology. Furthermore, glycoproteomics has aided the search for diagnostic and prognostic biomarkers that can stratify patients for specific interventions and follow disease progression. Most of these biomarker studies have aimed to identify and quantify glycopeptides and glycoproteins or determine the occupancy of specific glycosites to identify differential changes in protein glycosylation patterns in conditions of health and disease (Fig. 4). Additionally, glycoproteomic data are increasingly being combined with data from other omic methods such as transcriptomics, proteomics, glycomics, phosphoproteomics and metabolomics52,83,367 to better understand the connection between site-specific glycosylation and the various biological processes that take place in complex systems. So far, most glycoproteomic studies that incorporate multi-omics have focused on N-glycoproteomics, although there are also a few examples for O-glycoproteomics as discussed below.

Fig. 4: A hypothetical biomarker discovery workflow.
figure 4

a | Bottom-up glycoproteomic studies using clinical samples from healthy controls and patients (in this hypothetical case, cerebrospinal fluid (CSF) from healthy controls and patients with Alzheimer disease (AD)) can identify prognostic and diagnostic biomarkers through finding glycopeptides that are differentially regulated between the two populations. Top: examples of glycopeptides not differentially regulated by disease conditions. Middle: differentially regulated glycopeptides. Bottom: loss of glycosylation in disease conditions. Dashed boxes indicate selected biomarker candidates. Volcano plots such as that displayed can show significant differences in abundance of glycopeptides from control and patient samples. Volcano plot generated using the VolcaNoseR online resource. b | Following the discovery of candidate biomarkers, larger patient cohorts can be used to validate selected glycopeptides by targeted parallel reaction monitoring (PRM) and liquid chromatography–mass spectrometry (LC–MS) to monitor specific glycopeptides of interest across control and patient-derived samples. These studies, which can focus on the identification of specific glycoforms (biomarker 1) or the absence or presence of glycosylation events (biomarker 2), aim to confirm that the markers of interest enable the separation of groups, such as a control group (CTRL) from an AD cohort at a population level. c | Standardized assays can be developed for validated candidates to aid in diagnosis. For example, specific changes in the predominant glycosylation of an isolated protein or peptide can be detected using a lectin-based enzyme-linked immunosorbent assay (ELISA) (left panel). Alternatively, loss of glycosylation can be pursued through targeted PRM analysis, in which spiking known amounts of a stable isotope-labelled peptide counterpart allows direct comparison of analyte amounts in different clinical specimens (right panel). Combining such biomarkers can lead to improved diagnostic and prognostic characteristics. AUC, area under the curve; m/z, mass to charge ratio; ROC, receiver operating characteristic; RT, retention time.

Mapping N-glycosylation for diagnostics

Many studies have mapped N-glycosites in patient-derived biofluids or cellular material to identify informative biomarkers for diagnostic and prognostic applications368,369. N-glycoproteomics has been extensively used to analyse various sources of neural tissue in an attempt to identify biomarkers for neural diseases, including stem cell-derived neural cells, mouse brains and patient-derived CSF88,370,371. Recently, comparative in-depth N-glycoproteomic analysis of CSF samples from healthy controls and patients with Alzheimer disease demonstrated differential N-glycosylation patterns between cohorts368. Similarly, comparisons of postmortem human Alzheimer disease and control brain tissue have shown quantitative changes in N-glycosite occupancy in clinically relevant proteins372.

N-glycoproteomics has also been explored as a tool for the early detection of cancer. Cancer models studied so far include ovarian cancer cell lines with differential resistance to the chemotherapeutic agent doxorubicin373, as well as patient serum samples369, and native and xenografted tissues from ovarian serous carcinoma369,374,375. These studies have demonstrated that the detection of select glycopeptide signatures may be useful in diagnostic applications, for the stratification of patients or to follow disease progression. Studies in other cancers have also shown differential abundance of select N-glycopeptides between tissues, serum and bodily fluids from healthy donors and patients with cancer, further suggesting that alterations in specific N-linked glycosylation events may correlate with cancer progression13,376 and that the integration of N-glycoproteomic profiles can improve diagnostic sensitivity compared with proteomics alone377,378,379,380,381.

Mapping O-glycosylation

The application of O-glycoproteomics to a range of biological questions has resulted in a massive expansion of the mammalian O-glycoproteome53,76,77,124, leading to unexpected discoveries such as the discovery of O-glycosylated neuropeptides and peptide hormones67,382, O-glycans in LDLR-related protein linker sequences80 and extensive O-glycosylation of viral envelope proteins66,78.

The discovery of O-glycoproteases and their inactive mutants has led to the development of O-glycoprotein and mucin-domain glycoprotein enrichment methods. A notable example of using catalytically active O-glycoproteases for O-glycosite enrichment is the site-specific extraction of O-linked glycopeptides (ExoO) approach, which has been used to identify O-glycosites on more than 1,000 proteins across human kidney tissue, T cells and serum samples124. Inactive O-glycoproteases have also been shown to be robust affinity tools for enabling the differentiation of cancer-associated changes in mucin-domain-containing glycoproteins96,125. A recent preprint publication showed that inactive StcE-based enrichment was capable of isolating hundreds of O-glycopeptides from patient-derived ascites fluid, including many from MUC16 — the classic, gold-standard biomarker for ovarian cancer383.

Genetic knockouts of specific GalNAc-Ts have identified isoform-specific substrates in various cell lines and tissues68,79,81,82,83 that could give information on the pathophysiological mechanisms that drive congenital disorders of glycosylation384. Further genetic engineering-driven glycoproteomic strategies — first using zinc finger nucleases76,385 and more recently CRISPR-based approaches83,158 — have led to the discovery of novel glycosylation pathways such as an O-mannosylation system responsible for glycosylation of cadherins158. These discovery-driven applications of glycoproteomics have expanded our understanding of carbohydrate-binding proteins386,387,388, providing insights into how glycan recognition may have an important role in cancer development.

Glycoproteomics in multi-omic studies

Multi-omic approaches that combine transcriptomic and glycoproteomic analyses can provide context for the global consequences of N-glycoproteomic or O-glycoproteomic changes in cell systems, disease models and clinical specimens79,115,116,389,390. For example, in a clinical setting, combining N-glycoproteomic-based classification of tumours with transcriptomic changes led to biomarker discovery and prospective therapeutic targets based on the pathways identified391. Further, public genomic, transcriptomic or proteomic repositories of patient cohort data can be excellent sources of data for correlation with glycoproteomic data381, an approach that has been used to help understand global regulatory networks in cell differentiation programmes392.

Another successful multi-omic approach is to combine glycoproteomic data with data from phosphoproteomic analyses393,394,395,396,397. The integration of phosphoproteomics, proteomics, transcriptomics and glycoproteomics can provide comprehensive insights into disease mechanisms or tissue development, as recently shown for both N-linked and O-linked glycans83,398. In such multi-omic studies, transcript expression can be correlated with protein expression, and cross-referencing of PTMs with protein abundance and signalling networks gives a narrow selection of relevant targets for downstream study83.

The modelling of glycans at specific sites can be useful for understanding the functional impacts of changes in glycosylation. Multiple platforms provide tools for predicting 3D structures of carbohydrates attached to glycoproteins399, and it has been argued that new tools such as AlphaFold2 should be modifiable to incorporate PTMs such as glycosylation, which will enable far more realistic structural predictions400. Integrative bioinformatics tools such as the GlycoDomainViewer401 are now also beginning to emerge, which allows glycosylation sites to be assessed within the context of the protein sequence, domain architecture and other known PTM events.

Although MS-based glycoproteomic applications are becoming more mainstream, several challenges remain. Comprehensive characterization of glycosite microheterogeneity and reliable quantification of glycopeptides harbouring different glycans is still challenging in complex clinical samples. These challenges are exacerbated when the amount of sample is limited and when multi-omic analysis from an identical sample is required. Methods that preserve the natural context and provide reliable quantification should be prioritized given the limitations of cell culture-based systems. One of the next milestones for the community will be applying glycoproteomics at the level of individual cell types, or even at the single-cell level, which could provide insight into the spatiotemporal regulation of glycosylation in different tissues. Recent progress in MOE labelling has now shown that cell line-specific glycoprotein tagging can be achieved within in vivo models (as shown in a recent preprint article), opening new opportunities to explore cell lineage glycoproteomes in native contexts402. As the field develops, translating the findings of glycosite mapping studies into a deeper understanding of the molecular mechanisms regulated by glycosylation will become the central goal of glycoproteomics.

Reproducibility and data deposition

Glycoproteomics is still a maturing field and, unlike proteomics and other omic disciplines, has yet to experience consolidation and harmonization of its experimental methodologies and informatics approaches. As the glycoproteomics community grows, it will be important to establish conventions and move towards the use of standardized approaches that reflect best practice for the collection, management and sharing of data. Below, we discuss factors that lead to known reproducibility issues.

Variations in data collection

A key factor that contributes to the lack of reproducibility in glycopeptide data sets across laboratories is the inconsistent and often incomplete description of sample handling, sample processing and data acquisition parameters such as those relating to LC–MS/MS experiments. Experimental variations in peptide generation, chemical derivatization or labelling steps and glycopeptide enrichment can greatly affect the resulting glycopeptide data and are often not fully explained. These differences can be compounded in the LC–MS/MS acquisition process by, for example, changing MS ionization and fragmentation behaviours. For these reasons, it is crucial to fully describe these parameters in published research. It should be noted that MS instrument cleanliness and chromatography performance are also vitally important for data integrity403.

A diverse set of experimental methods are available for glycoproteomics data generation as demonstrated by several glycopeptide-focused multi-laboratory studies conducted through the Human Proteome Organization’s Human Disease Glycomics/Proteome Initiative404,405, the Association of Biomolecular Resource Facilities406 and the National Institute of Standards and Technology (NIST)407. Although analytical diversity could be considered a strength of the field, several of these experimental methods, some using highly customized and non-commercial reagents, are employed by few groups worldwide, therefore making data difficult to reproduce. Standardization of methods across laboratories could reduce some of these observed variations, although we acknowledge it is unlikely that a one-size-fits-all approach to methodologies would be advantageous for many biological questions.

Variations in data analysis

Analysis of glycopeptide data is challenging and a source of variation in glycoproteomic experiments. A recent multi-institutional study performed by the Human Proteome Organization (HUPO) Human Glycoproteomics Initiative evaluating software tools for serum N-glycopeptide and O-glycopeptide analysis using glycopeptide data sets provided from various glycoproteomic laboratories found that the identified glycopeptides varied dramatically between laboratories even when the same informatic tools were employed, confirming that variables such as pre-processing and post-processing methods substantially affect glycopeptide assignments even on identical data sets365. Although this comparison identified several high-performance search strategies, the large variability in the performance of software tools and search parameters highlights that ongoing benchmarking to track and compare the performance of glycoproteomic informatics used across the community is crucial.

Data deposition and sharing

Data repositories will be essential for glycoproteomics data to comply with the FAIR data deposition standards408. The MIRAGE initiative has taken the lead in proposing reporting guidelines for glycomics409, and these are currently undergoing refinement to provide guidelines for glycoproteomic data. The MIRAGE guidelines have been adopted by several journals to ensure that consistent information is reported for glycomic experiments with the goal that the finalized glycoproteomics guidelines will provide a clear framework for the glycoproteomics community. To facilitate the sharing of data, glycoproteomic-centric repositories have been launched, for example, GlycoPOST410, which assigns unique identifiers to raw MS data for individual projects and provides input forms and spreadsheets to give users a template for providing metadata required by MIRAGE guidelines. The database UniCarb-DR411 is complementary to GlycoPOST and allows users to visualize glycan structures annotated in the raw MS data. At the time of writing, UniCarb-DR and GlycoPOST are both available from the GlyCosmos Glycoscience Portal412. ProteomeXchange413,414 is also available for (glyco)proteomic LC–MS data deposition. As avenues for data sharing are now established, all published glycoproteomic data should be made publicly available. Many journals are already beginning to implement this requirement and it is important to note that ensuring the public availability of data will be a community effort.

Limitations and optimizations

Several assumptions and experimental trade-offs shape the conclusions that can be drawn from glycoproteomic studies. Although workflows used to undertake glycoproteomics are continuously improving, a clear understanding of potential limitations and the underpinning assumptions associated with these workflows is needed to best interpret glycoproteomic data.

One MS/MS event, multiple glycoforms

A common assumption for glycopeptide MS/MS events is that each of the resulting spectra contains a single glycoform; however, multiple isobaric glycans57 or isomeric glycosylation states415 may be observed within a single MS/MS spectrum. Isobaric glycans and isomeric glycopeptides possess similar elution profiles when separated using chromatography approaches such as RP-LC, resulting in mixtures of glycoforms being subjected to MS/MS analysis (Fig. 5a). This leads to the generation of chimeric spectra that complicate the assignment of glycosylation sites and glycan arrangements (Fig. 5). Chimeric spectra have been observed in N-linked glycoproteomic studies54, and O-linked glycopeptides are known to display multiple isomeric species63. Careful analysis of chromatography-separated isomers270 or use of additional separation techniques such as IMS216 can help to resolve co-eluting isomeric glycosylation sites.

Fig. 5: Glycopeptide co-fragmentation and chimeric spectra.
figure 5

Glycopeptide co-elution and co-isolation of isomeric species can lead to the generation of chimeric spectra containing fragments from two or more precursor ions. a | Glycopeptide isomers can possess unique elution properties when separated with reverse phase separation, although some isomers may have closely related elution profiles. b | The presence of multiple glycopeptide isomers in samples can result in the observation of multiple overlapping Gaussian features in the chromatogram. c,d | Examining tandem mass spectrometry (MS/MS) spectra corresponding to different retention times results in distinct MS/MS spectra containing different mixtures of isomeric glycopeptide species. These chimeric spectra are identifiable by the presence of fragment ions corresponding to the modification attached to two residues, such as the c12 and z4 ions highlighted in blue. Mixtures of isomeric glycopeptides can result in chimeric spectra, supporting the assignment of mutually exclusive glycosylation events. ETD, electron transfer dissociation; m/z, mass to charge ratio.

MS-based glycan class assignments

MS data provide limited insights into monosaccharide identity or linkage information (see above). This lack of information limits the ability to assign glycan classes on the basis of mass alone. Although the conservation of glycosylation pathways in eukaryotic glycosylation systems does constrain many glycan compositions, which allows glycan classes to be predicted and/or assigned with reasonable confidence416,417, it is important to note that these should still be treated as unconfirmed assignments. Orthogonal methodologies can be used to further support the presence of specific glycans or linkage configurations such as the use of exoglycosidases418; the release of glycans and confirmation of specific glycans using isomeric resolving approaches such as PGC310,367; or the analysis of oxonium fragmentation patterns to support monosaccharide assignments114,350. In situations where glycans are ambiguous, restraint in the assignments of glycan classes is best practice. Alternatively, an increasingly accessible way to corroborate glycopeptide assignments is the use of synthetic glycopeptide standards, which allow subtle changes in retention time or fragmentation properties to be detected to support glycan identities419.

Ambiguous localizations

The community’s ability to assign glycosylation sites has seen a dramatic improvement over the past decade with multiple innovations in instrumentation and data acquisition, such as increased accessibility to ExD dissociation methods on multiple instrument platforms and improved data collection approaches57,341. These innovations do not guarantee that localization information will be obtained for a given glycopeptide, and a large proportion of glycopeptides are not able to be localized within most data sets. A growing question within the field is whether site localization is needed for all glycosylation experiments, especially if localization comes at the cost of speed and subsequent glycoproteomic depth252. Glycopeptide-focused DIA analysis286,339, which is undertaken using beamCID, highlights this change in thinking and the growing acceptance of site ambiguity. Many in the field advocate that sites should be assigned either as localized or non-localized on the basis of the available fragmentation information76,77,286 (Box 2). Further, a formal system to stratify glycosylation site ambiguity on the basis of site localization probability was recently proposed by Lu, Riley et al.336 to provide a means to categorize assignment quality. In reality, not all biological questions need complete unambiguous glycosylation site assignments; for example, studies in which the focus is the identification of glycans367,420 or the quantification of glycopeptide abundances52,310 will not be affected by site ambiguity. By contrast, site localization can be crucial for confirming atypical glycosylation events such as tyrosine O-glycosylation76,421 or when attempting to fully characterize the site-specific glycosylation of a protein of interest, especially when both N-glycans and O-glycans are present. It should be noted that at least partial localization of glycans may be required for peptides with multiple glycosylations to avoid misassignment of glycan compositions63,335.

Outlook

Glycosylation shapes nearly all biological processes across all areas of life, and there has been a rapid growth in glycobiology-focused efforts over the past two decades to define and understand the role of the complex and dynamic glycoproteome. The development of chemical biology tools for tagging glycoproteins93,162,169,170, enrichment techniques to isolate glycopeptides114,129 and new glycoproteome-specific reagents such as O-glycoproteases96,125 have greatly improved our ability to site-specifically map glycosylation across biological systems. Over the coming years, improved access to glycoproteomics toolkits promises to stimulate further activity in the field and promote an increasing number of studies exploring fundamental and applied questions in glycobiology.

Glycoproteomics has shown potential to differentiate disease subtypes, stratify patients and predict clinical outcomes in complex human diseases such as cancer398,422, inflammation423,424 and microbial infections425,426, and there is great potential for glycoproteomic analysis to improve diagnostic sensitivity and precision376. Community-based development of robust methods and software that implement best practice for data interpretation, standardization and sharing will be essential for clinical translation; this has begun with the establishment of glycoproteomic focused sharing platforms such as GlycoPOST410. Although these developments are promising, ease of use and implementation is still the major hurdle currently limiting the translation of glycoproteomics to the clinic.

It is important for glycopeptide-focused software solutions to be developed in parallel with new practical techniques. Future tools should aim to be customizable to facilitate the analysis of diverse glycoproteomes beyond the mammalian realm, including in plants, invertebrates and microbial systems31,33,34. For future software solutions, the crucial challenges will be identifying and localizing multi-glycosylated peptides, statistical control of glycopeptide identification and distinguishing glycan structural isomers.

Marked improvements in proteomic sample multiplexing, chromatography and MS acquisition speed are likely to lead to increased throughput in the field of glycoproteomics. Peptide-based sample multiplexing techniques using tandem mass tags currently allow 18 samples to be analysed within a single proteomic experiment304. Multiplexing can also be used to provide structural insights by allowing the incorporation of samples treated with specific glycosylation inhibitors112 or the inclusion of genetic knockouts of specific glycosyltransferases or glycoside hydrolases371,427, enabling glycan class or isoform information to be obtained that may otherwise be missed.

Improvements in glycoproteomic depth are likely to come from new tools. The recent demonstration of a large range of bacterial glycan-targeting hydrolytic enzymes125 shows that the current repertoire of glycoproteases represents only a small subset of possible enzymatic activities and specificities. As our understanding of glycan-modifying enzymes improves428,429, so too will our ability to rationally modify and tailor these enzymes to target or enrich specific glycosylation sites and their glycans of interest. Modified enzymes and affinity tools generated against specific glycans430 will be particularly valuable to advance less-mature areas of glycoproteomics such as C-glycosylation431. Additional methods for unbiased, untargeted quantitative profiling of multiple glycosylation classes in a single experiment will also be crucial.

Applications such as single-cell analysis and top-down glycoproteomics still represent significant technical barriers for the field. Although isobaric labelling approaches are increasingly used for single-cell proteomic analysis432,433 and have the potential to enable single-cell glycoproteomics, it remains to be seen how applicable these approaches will be. The use of charge detection MS434,435,436 has the potential to radically improve top-down glycoform characterization, and integration of these approaches for glycoproteomics will require further development. Non-MS-based DNA-sequencing methods using oligonucleotide-labelled lectins have been used by several groups to explore glycosylation changes at the single-cell level113,437. Further, a recent study demonstrated that non-glycosylated and glycosylated forms of peptides can be resolved using nanopore sequencing438, suggesting that this technique may enable single-molecule analysis of glycopeptides and glycoproteins. Although these technologies are still in their infancy, they have considerable potential to provide orthogonal information to MS-based glycoproteomics.

Great strides have been made in glycoproteomics-based identification of glycosylation events and the discovery of new or unusual types of protein glycosylation33,367,439. Over the coming years, glycoproteomics will increasingly provide valuable mechanistic insight into the formation and role of protein-linked glycans in biological processes. New insights into mechanisms such as the requirement of N-linked fucosylation for ricin toxicity371 or the role of specific O-GlcNAcylation sites in metabolic regulation440 have already been established using glycoproteomics. Further, multi-omic integration has enabled a holistic understanding of biological systems and it is likely that the integration of glycoproteomics with other omic techniques for the analysis of large cohorts will further enhance our knowledge at a population level. For example, the identification of common genetic variants associated with differences in glycosylation through genome-wide association studies may further enhance mechanistic insights and unravel potential disease predispositions424,441.

As methods and technologies continue to evolve, one of the most exciting opportunities for the field will be further integration and improvements in the bioinformatic space. Across the life sciences, the growing application of machine learning approaches is leading to new ways to model, analyse and handle large data sets of increasing complexity and information content442,443. Machine learning and artificial intelligence are not used routinely by the glycoproteomics community, although their increasing use in proteomics444 suggests that these approaches will become commonplace in glycoproteomics workflows. Collectively, these transformative tools are likely to make glycosylation analysis accessible to a wider range of life scientists, ultimately improving our understanding of organismal development, disease adaptation and evolution.