Fluorescent glycan fingerprinting of SARS2 spike proteins

Glycosylation is the most common post-translational modification and has myriad of biological functions. However, glycan analysis has always been a challenge. Here, we would like to present new techniques for glycan fingerprinting based on enzymatic fluorescent labeling and gel electrophoresis. The method is illustrated on SARS2 spike (S) glycoproteins. SARS2, a novel coronavirus and the causative agent of the COVID-19 pandemic, has had significant social and economic impacts since the end of 2019. To obtain the N-glycan fingerprint of an S protein, glycans released from the protein are first labeled through enzymatic incorporation of fluorophore-conjugated sialic acid or fucose, then separated by SDS-PAGE, and finally visualized with a fluorescent imager. To identify the labeled glycans of a fingerprint, glycan standards and glycan ladders are enzymatically generated and run alongside the samples as references. By comparing the mobility of a labeled glycan to that of a glycan standard, the identity of glycans maybe determined. O-glycans can also be fingerprinted. Due to the lack of an enzyme for broad O-glycan release, O-glycans on the S protein can be labeled with fluorescent sialic acid and digested with trypsin to obtain labeled glycan peptides that are then separated by gel electrophoresis. Glycan fingerprinting could serve as a quick method for globally assessing the glycosylation of a specific glycoprotein.

Previously we reported the separation of fluorophore tagged N-glycans released from fetal bovine fetuin and some pharmaceutical antibodies by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) 23 . Further in this direction, we would like to present a novel glycan fingerprinting technique based on non-reducing labeling for studying glycosylation (Fig. 1), which is different from capillary electrophoresis of fluorescent glycans labeled at the reducing ends 24,25 . In our method, N-glycans are first released by peptide N-glycosidase F (PNGase F) treatment, with or without additional sialidase and fucosidase. This is followed by labeling with fluorophore-conjugated sialic acid or fucose using a sialyltransferase or a fucosyltransferase, respectively 23,26 . O-glycans are labeled first, then the glycoprotein is digested by trypsin, and finally the glycopeptides are separated by SDS-PAGE. These methods allow for quick assessment of the glycosylation pattern of a glycoprotein. As an example, we applied these techniques to study the glycosylation of several SARS2 S protein constructs expressed from different host cells.

Results
Electrophoretic mobility of Cy5-labeled glycans on SDS PAGE. To correlate the glycan structures to their mobility, we established a series of labeled glycans based on a biantennary antibody glycan A2 (Fig. 2). The short names for glycans in this study are based on Oxford Glycan Notation (Table 1). A2 was first labeled by FUT8 with Cy5-conjugated fucose to become FA2. A series of glycans were then generated enzymatically based on FA2 ( Fig. 2A,B). A2 was also extended by B4GalT1, with or without prior modification by FUT8 and MGAT3, and finally labeled by ST6Gal1 with Cy5-Neu5Ac to generate FA2G2S(6)1, A2BG2S(6)1 and A2G2S(6)1 (Fig. 2C). The following observations were made regarding the mobility change caused by the addition of different monosaccharides. First, addition of a neutral monosaccharide such as a Gal, GlcNAc, and Fuc to a glycan slows down the mobility of the glycan at a noticeable rate. Second, addition of a bisecting GlcNAc slows down the mobility of a glycan at roughly half the rate of that of a β,6-linked GlcNAc. Third, addition of a sialic acid residue significantly increases the mobility of a glycan, with α,6-linked sialic acid exhibiting greater mobility than α,3-linked sialic acid. Likewise, when a monosaccharide is added at multiple positions on a glycan, intermediate glycosylation products exhibit intermediate mobilities. For example, FA2G1 moves faster than FA2G2, and FA2G2S1 moves slower than FA2G2S2 (Supplemental Fig. 1). Intermediate products were only observed Relative mobilities of various Cy5-labeled glycans and their enzymatic synthesis. All glycans were enzymatically synthesized starting from the glycan A2 (known as G0 in IgG glycan notation). The Glycan ladders were composed by combining some of the labeled glycans. With the exception of the enzymatic reactions of FUT8, MGAT3 and MGAT5, all other enzymatic reactions resulted in one or two intermediates as there are two or more branches in each glycan that allow enzymatic modification (see Supplemental Fig. 1 for intermediates). The labeled glycans were separated on a 17% gel and visualized with a fluorescent imager. (A) Relative mobilities of glycans that were labeled at the core-fucose. The enzymes used for generating these glycans are listed on the top of the image. (B) Relative mobility change on FA2 by addition of a bisecting GlcNAc introduced by MGAT3 (MT3) versus a β1-6GlcNAc introduced by MGAT5 (MT5). MGAT3 and MGAT5 were introduced in different orders. The 2nd enzyme was introduced 30 min after the 1st enzyme. FA3B containing both a bisecting and a β1-6 GlcNAc can only be observed when MGAT5 was introduced first. (C) Relative mobility change on A2G2S(6)1 by addition of a bisecting GlcNAc versus a core-fucose. The glycans were generated and labeled starting from A2 with the enzymes indicated at the top of the gel in the specified order. MT3, MGAT3; FT8, FUT8; B41, B4GalT1; ST61, ST6Gal1. (D) Schemes for enzymatic generation of the labeled glycans in (A) and (B). Labeling was through FUT8 and GDP-Cy5-Fuc (GDP-f '). (E) Schemes for enzymatic generation of the labeled glycans in (C). Labeling was through ST6Gal1 and CMP-Cy5-Neu5Ac (CMP-S'). Two Cy5-Neu5Ac residues can be introduced to each of the glycans, but only glycans with one Cy5-Neu5Ac were displayed in (B), (C) and (E). www.nature.com/scientificreports/ within a short time window and were converted to final products after prolonged incubation. The contributions of various monosaccharides to gel mobility are summarized in Table 2.
Selection of labeling enzyme and optimization of substrate concentrations. Before fingerprinting glycans released from various SARS2 spike proteins, we screened the labeling enzymes and optimized the substrate concentration for the labeling reaction using glycans released from the RBD protein expressed in CHO cells. The glycans were first probed by various sialyltransferases, including ST6Gal1, which generates α2,6-sialylated N-glycans 27 , and, ST3Gal3, ST3Gal4 and ST3Gal6, which generate α2,3-sialylated N-glycans 28,29 . Among these enzymes, ST6Gal1 and ST3Gal6 gave stronger signals (Supplemental Fig. 2A) and were chosen for the following glycan fingerprinting study. Stronger signal intensities were also observed when the substrate input was around 2 µg (Supplemental Fig. 2B) and the donor CMP-Cy5-Neu5Ac input was around 0.4 nmol (Supplemental Fig. 2C), therefore, these conditions were chosen for the following fingerprinting study.
N-glycan fingerprinting study of SARS2 spike proteins with ST6Gal1. N-glycans released from the following SARS2 spike protein constructs, with or without prior desialylation, were labeled with ST6Gal1/ CMP-Cy5-Neu5Ac: RBD domain expressed in Sf21 cells (RS), RBD domain expressed in CHO cells (RC), RBD domain expressed in HEK293 cells (RH), full length spike protein expressed in CHO cells (SC), full length spike protein expressed in HEK293 cells (SH), and S1 protein expressed in HEK293 cells (S1H). As the presence of oligomannose glycans on S proteins were reported previously 15,21 , FUT8/GDP-Alexa Fluor 555-Fuc together with MGAT1/UDP-GlcNAc, that allows the labeling of M3 and M5 (commonly known as Man3 and Man5) 23 were also added to the final labeling reactions to reveal these glycans. ST6Gal1 labeling revealed a series of bands with large variations from all constructs except RS (Fig. 3A). In general, desialylation resulted in elimination of some fast-moving bands and increased labeling on some slow-moving bands, suggesting the existence of both sialylated and asialylated glycans on these proteins. Several common bands were observed (band 1-6) in Fig. 3. Bands 1 and 2 were mainly found in neuraminidase treated SC, SH and S1H at the position around FA3 and FA2 (Fig. 3B). Since labeling through ST6Gal1 also contributes a sialic acid and therefore makes a labeled glycan move much faster, band 1 and band 2 could be due to labeling on highly branched complexed glycans, such as tetra-and tri-antennary complex glycans. Band 1 was mainly observed in desialylated SC, SH and S1H, suggesting that the glycan was initially sialylated. Band 2 was observed in SC, SH and S1H samples before and after desialylation, but displayed a significant signal increase upon desialylation, suggesting that the glycan was initially largely sialylated. Band 3 was prominent in all SH and S1H samples and in desialylated samples of RC and SC and had the same mobility of FA2G2S(6)1 (Fig. 3B). Since band 3 and the reference glycan FA2G2S(6)1 had the same labeling (Cy5-Neu5Ac) and same mobility, band 3 is likely to be FA2G2S(6)1 and was the labeling product of FA2G2 (Fig. 2E). The fact that band 3 was much weaker in RC and SC than in desialylated RC and SC samples suggests that the glycan was initially sialylated in these samples. Opposite to band 3, band 4 around the position of FA2G2S(3)2 had a strong presence in RC and SC but not in the desialylated RC and SC samples, suggesting that band 4 was due to the labeling of a partially sialylated glycan that was converted to band 3 when desialyation occurred before labeling. Band 5 was likely due to the labeling of oligomannose M5 as the band had the same mobility as the reference glycan, FA1M5 (Fig. 3A). Band 6 had almost equal intensity in all samples and did not respond to C.perfringens Neuraminidase treatment. The fast mobility of band 6 suggests that it is highly sialylated, but the fact that it was unresponsive to neuraminidase treatment suggests the opposite. The nature of band 6 remains to be investigated. Table 2. Contributions of various monosaccharides to relative mobility (rm). The rm of each individual glycan was determined based on the gel images in Fig. 2, where the relative mobility of FA2 was arbitrarily set to 0 and that of FA2G2S(6)2 was set to 1. The contribution of a monosaccharide to the mobility was calculated based on the difference in the mobility of a pair of glycans that differed only by the monosaccharide. A minus number indicates that the monosaccharide slows down the mobility and a positive number indicates that the monosaccharide makes the mobility faster. b-GlcNAc bisecting-GlcNAc. www.nature.com/scientificreports/ Most of the common bands displayed great variation among the samples. For example, band 4 was the most abundant in RC, but almost at negligible level in RH (blue arrows in Fig. 3); band 5 was the most abundant in SH, but almost completely lacking in RH. Surprisingly, some bands were only found in one sample but not the others, such as bands a, b, c and d (Fig. 3A). Band a and b in RC had slow mobility and responded to neuraminidase treatment, suggesting that they had highly complexed structures and were initially sialylated. Band c in RH was just above the position of FA2G2S(6)1, suggesting that it might be FA2BG2S(6)1 that contains a bisecting GlcNAc (Fig. 2B,C). Band d labeled by FUT8 was found only in RS and had faster mobility than FA1M5, suggesting that it might be FA1 (labeled product of M3), consistent with the notion that M3 is a main glycan expressed in insect cells 30 . Additional enzymatic conversion of band d with B4GalT1 and ST6Gal1 further confirmed the identity of band d (Supplemental Fig. 4).
N-glycan fingerprinting study of SARS2 spike proteins with ST3Gal6. Both ST6Gal1 and ST3Gal6 are known to sialylate the Galβ1,4GlcNAc structure on glycoproteins 29 . When the same set of SARS2 spike protein samples were probed with ST3Gal6, similar but distinctive glycan fingerprints were observed (Fig. 4). It seems that the entire banding pattern revealed by ST3Gal6 was shifted up from that of ST6Gal1. For example, bands 1′, 2′, 3′, and 6′ in the ST3Gal6 labeled SH sample corresponded well with bands 1, 2, 3, and 6 in the ST6Gal1 labeled SH sample; similar to band 6, band 6′ was found across all lanes; similar to the relative positioning of band b and band 3 revealed by ST6Gal1, band b′ labeled by ST3Gal6 was shifted slightly up from band 3′. The shift observed in the bands revealed by ST3Gal6 compared to those in the ST6Gal1 labeled sample is likely because glycans with an α,3-linked sialic acid had slower mobility than corresponding glycans with α,6-linked sialic acid ( Fig. 2A, Table 2). www.nature.com/scientificreports/ While there was similarity between the fingerprints revealed by the two enzymes, ST3Gal6 labeling also revealed some unique bands. For example, bands marked with asterisks in SH revealed by ST3Gal6 had no corresponding bands in SH revealed by ST6Gal1. This difference in the banding pattern suggests that ST3Gal6 and ST6Gal1 have overlapping but distinct substrate preferences.

Fingerprinting O-glycans of SARS2 spike proteins with ST3Gal1. O-glycosylation of the RBD
domain of the SARS-Cov-2 spike protein has been reported previously 21,31 . Here, we investigated whether the fingerprinting technique could be applied to study O-glycans as well. One challenge for fingerprinting O-glycans is that there is no enzyme (corresponding to PNGase F for the removal of N-glycans) that allows for the wide removal of O-glycans. Currently, E. faecalis O-glycosidase (Endo EF) is used to remove Core-1, Core-2 and Core-3 type O-glycans 32 . Another challenge is that O-glycans are typically smaller than N-glycans and rather difficult to separate from the donor substrates of various glycosyltransferases by SDS-PAGE.
To overcome the above challenges, we first directly probed O-glycans on the RBD proteins using O-glycan specific ST3Gal1 as previously described 26 (Fig. 5A). Indeed, O-glycans were detected on all RBD proteins investigated, but with different levels of sialylation and sensitivity to Endo EF treatment (Fig. 5B). The labeling on the RBD proteins expressed in CHO cells, Sf21 cells and Tn cells were removed by Endo EF treatment, further suggesting that the labeled O-glycans were Core-1, Core-2, or Core-3 types. Since Core-3 O-glycan lacks a terminal www.nature.com/scientificreports/ Gal residue for ST3Gal1 labeling, it was excluded from the possibility. To further identify the O-glycans on the RBD proteins, the RBD protein expressed in Sf21 cells was first pretreated with GCNT1 33 , a GlcNAc transferase that converts a Core-1 O-glycan to a Core-2 O-glycan, or together with B4GalT1 34 , which can extend the GlcNAc residue on the Core-2 O-glycan, and then labeled with Cy5-Sialic acid by ST3Gal1, digested with trypsin, and separated in SDS-PAGE (Fig. 5C). Trypsin digestion resulted in two labeled glycopeptides in all samples, but with different mobility. The mobility of the two peptides were shifted by GCNT1 and B4GalT1 sequentially and finally matched to the mobility of the peptides from the RBD protein expressed in HEK293 cells. These results suggest that O-glycans on the RBD protein expressed in HEK293 are extended Core-2 type O-glycans, whereas the O-glycan on the RBD protein expressed in Sf21 cells is a Core-1 type.

Discussion
In this report, we have established a novel method for N-glycan fingerprinting based on enzymatic fluorescent glycan labeling and electrophoresis. As a method of fingerprinting, the overall glycan patterning rather than individual glycan species including their structures is the focus. This method can serve as a quick and inexpensive way to determine if different batches of glycoproteins are consistently glycosylated and identify samples with abnormal glycosylation. Herein, this strategy is demonstrated on several SARS2 spike proteins, in which N-glycans are released and enzymatically labeled with fluorophore-conjugated sialic acid and fucose and separated on SDS-PAGE. Although the strategy does not allow site specific and detailed structural glycan analysis, it does offer some major advantages over traditional similar methods of glycan analysis such as fluorophore-assisted carbohydrate electrophoresis (FACE) 35 . First, it is simpler and more convenient via non-reducing end direct enzymatic labeling. Second, the data acquired can be more informative by simultaneous labeling on different glycans with different fluorophores. Third, multiple samples can be processed simultaneously, therefore it is highly efficient. Fourth, the signal intensity is directly related to the abundance of a glycan species and therefore is more quantitative 23 . www.nature.com/scientificreports/ Fifth, released glycans and the deglycosylated protein can be viewed in a same gel, allowing strict correlation of a protein and its glycans. While this method only reveals the substrate glycans of the labeling enzyme, and glycans that are not recognized by the labeling enzyme remain undetected, this could be advantageous as well when only specific glycans are being examined.
Recognizing that fingerprinting is focused on general banding patterns, it is always beneficial if the identities of individual bands on a gel can be determined. To this purpose, glycan standards and glycan ladders can be run along with samples to serve as references. By comparing the mobility of labeled bands to the mobility of the reference glycans, the identity of a labeled band maybe inferred. It is also found that the addition of a linkage specific monosaccharide changes the mobility of a glycan at relatively constant rate ( Fig. 2; Table 2). The knowledge of mobility shift caused by the addition of certain monosaccharides also allows us to deduce the identities of labeled bands. More specifically, the addition of a neutral monosaccharide slows down a glycan and the addition of a negatively charged sialic acid increases the mobility of a glycan. Particularly, a bisecting GlcNAc causes about half of the mobility shift generated by a β,6-linked GlcNAc and an α,6-linked sialic acid causes slightly more mobility shift than an α,3-linked sialic acid.
Our data suggest that the RBD of the SARS2 spike protein expressed in HEK293 cells mainly contains complex glycans, which is consistent with the reports of Watanabe et al. 6 . Our data also suggests that bisecting GlcNAc may exist on the RBD portion, and oligomannose glycans may mainly exist on the S protein, but not the RBD portion when expressed in HEK293 cells. Our data also indicate that the glycans of the S proteins expressed in insect cells and HEK293 cells are completely different. Using a similar technique, we found that the RBD protein expressed in insect cells contains Core-1 O-glycan, while the RBD protein expressed in HEK293 cells may contain extended Core-2 O-Glycan. Altogether, our study provides evidence to support the claim that host cell determines glycosylation, implying that the glycosylation of SARS2 from COVID-19 patients could be different from those expressed by other hosts, and further suggesting that mRNA vaccines 36 that involve S antigens produced by recipients themselves but not by other hosts could be more effective on combating COVID-19.
Releasing and labeling N-glycans of the spike proteins. To release N-glycans, 5 μg of a spike protein was mixed with 0.2 μg PNGase F and supplemented with labeling buffer (25 mM Tris pH 7.5, 10 mM MnCl 2 ) to 20 μL and then incubated at 37 °C for 30 min. For desialylation, an additional 0.2 μg C. perfringens neuraminidase was added into the reaction mixture together with PNGase F. The above mixture was then heated at 95 °C for two minutes to inactivate the enzymes. Labeling mixture contained 0.5 μg of a sialyltransferase together with 0.4 nmol of CMP-Cy5-Neu5Ac supplemented with labeling buffer to 10 μL. For labeling oligomannose, an additional 0.5 μg of FUT8 together with 0.4 nmol of GDP-Alexa Fluor 555-conjugated Fuc and 0.5 μg of MGAT1 together with 10 nmol of UDP-GlcNAc were also added into the labeling mixture. The labeling mixture was then added into the reaction mixture and incubated at 37 °C for 2 h or overnight at room temperature.
Labeling O-glycans of the spike proteins and trypsin digestion. The RBD proteins were labeled with or without pretreatment with C. perfringens neuraminidase, Endo EF, GCNT1 and B4GalT1. For pretreatment, 5 μg of a spike protein was mixed with 0.5 μg of the above enzymes individually or in combination (together with 1 mM of UDP-GlcNAc and UDP-Gal in the cases of GCNT1 and B4GalT1, respectively) in 20 μL labeling buffer, and then incubated for 20 min at 37 °C. The samples were then heated at 95 °C for 2 min to inactivate the enzymes. The samples were then labeled by the addition of 0.5 μg ST3Gal1 together with 0.4 nmol of CMP-Cy5-Neu5Ac and incubated at 37 °C for 30 min. The samples were mixed with 2 μg trypsin and incubated at 37 °C for 10 min for trypsin digestion.
Labeling glycan standards and building glycan ladder. For labeling a glycan with Cy5-Neu5Ac, 1 μg of the standard was mixed with 1 μg of ST6Gal1 and 1 nmol of CMP-Cy5-Neu5Ac together with 0.5 μg B4GalT1 and 10 nmol of UDP-Gal supplemented with labeling buffer to 20 μL and the mixture was then incubated at 37 °C for 2 h or at room temperature for overnight. For labeling a glycan standard with Cy5-Fucose, 2 μg of the standard was mixed with 1 μg of FUT8 and 2 nmol of GDP-Cy5-Fuc supplemented with labeling buffer to 20 μL and the mixture was then incubated at 37 °C for 2 h or at room temperature for overnight. For building a glycan ladder based on Cy5-Fucose labeled glycan standard, 200 ng of the above labeled glycan was extended with one or more of 0.5 μg each of the glycosyltransfeases, including MGAT3, MGAT5, B4GalT1, FUT9, ST3Gal6 and ST6Gal1 together with their donor substrates at 37 °C for 2 h or overnight at room temperature or whenever the reactions were completed. The reactions were then stopped by heating at 95 °C for 2 min. The glycan ladder was built by mixing equal amounts of the above extended labeled glycans.
Glycan electrophoresis and imaging. All labeled samples including glycan standards were separated on 15% or 17% sodium dodecyl sulfate-polyacrylamide gels at 20 V/cm. After separation, all gels were imaged using www.nature.com/scientificreports/ a FluorChem M imager (ProteinSimple, Bio-techne). For imaging protein contents, the gel was also imaged with traditional methods such as silver staining or trichloroethanol (TCE) staining.

Data availability
All data generated or analyzed during this study are included in this published article (and its Supplementary Information files).