Improvement of the glycoproteomic toolbox with the discovery of a unique C-terminal cleavage specificity of flavastacin for N-glycosylated asparagine

To determine all potential N-glycosylation sites of a glycoprotein, one central aspect of every bottom-up N-glycoproteomic strategy is to generate suitable N-glycopeptides that can be detected and analyzed by mass spectrometry. Specific proteases, such as trypsin, bear the potential of generating N-glycopeptides that either carry more than one N-glycosylation site or are too long to be readily analyzed by mass spectrometry– both due to the lack of tryptic cleavage sites near the N-glycosylation site. Here, we present a newly identified cleavage specificity of flavastacin, a protease from Flavobacterium menigosepticum, which - up to now - was only reported to cleave peptide bonds N-terminal to aspartic acid residues. In contrast to literature, we could not confirm this N-terminal specificity of flavastacin for aspartic acid. However, for the first time, we show a unique cleavage specificity of flavastacin towards the C-terminus of N-glycosylated asparagine residues. Implemented in an N-glycoproteomic workflow the use of flavastacin can thus not only render data analysis much easier, it can also significantly increase the confidence of MS-based N-glycoproteomic analyses. We demonstrate this newly discovered specificity of flavastacin by in-depth LC-MS(/MS) analysis of complex-type glycosylated human lactotransferrin and bovine serum albumin peptides and N-glycopeptides that were generated by trypsin and flavastacin digestion. Following to this work, further elucidation of the efficiency, specificity and mode of action of flavastacin is needed, but we believe that our discovery has great potential to facilitate and improve the characterization of N-glycoproteomes.

(LC-MS). In addition, a glycopeptide enrichment can be performed prior to analysis, i.e. by hydrophilic interaction chromatography 6 . With regard to MS, different fragmentation strategies tackle various challenges in the field of proteomics and glycoproteomics. Lower energy collision-induced dissociation (CID) is able to generate B-and Y-glycomoiety fragment ion series of glycopeptides 7 . It allows the annotation of the glycan composition and the calculation of the peptide mass, but often lacks b-and y-peptidemoiety fragment ion series 8 of the glycopeptides for sequence verification. Higher energy collisional dissociation (HCD) allows the adjustment of the normalized collision energy and generates peptide-specific b-and y-ions. However, due to the neutral loss of the glycan moiety, B-and Y-ion series are underrepresented 9,10 .
In addition to tryptic digestion of glycoproteins, further sequential treatment with other proteases is used to overcome low charge density and sequence constraints of glycopeptides with a large peptide moiety. The endoproteinase AspN is a zinc metalloendoproteinase produced in Pseudomonas fragi (Boehringer Ingelheim, Uniprot: Q9R4J4), which selectively cleaves peptide bonds N-terminal to aspartic acid 11 . AspN is also known for N-terminal cleavage at cysteine and glutamic acid 12,13 . Its primary use in glycoproteomic experiments so far, has involved the cleavage of deamidated asparagine after N-glycan release by peptide N-glycosidase F (PNGaseF) to assess N-glycan presence and location 14 . Flavastacin (New England Biolabs, Uniprot: Q47899), which is produced in Flavobacterium menigosepticum, has been described to behave similar to the AspN from Pseudomonas fragi 15,16 . Therefore, is also called AspN, despite its quite different amino acid sequence and thus, protein identity. However, both proteins belong to the family of metalloendoproteases. A BLAST search comparing these two proteins shows no overlapping sequences and only two short segments with quite low similarity (see Fig. 1). Thus, the two sequences cannot be aligned, which renders common protein functions unlikely.
This work describes a new and unique protease specificity of flavastacin for the C-terminus of N-glycosylated asparagine. In contrast to literature 15,16 , no specificity for the N-terminus of aspartic acid was observed. Subsequent analysis of this unexpected phenomenon via de-novo sequencing of the resulting N-glycopeptide fragment ion spectra led to the discovery of this previously unknown cleavage specificity of flavastacin. Our findings provide investigators with a new tool for targeted N-glycoprotein digestion to overcome common problems in N-glycoproteomics, like large N-glycopeptides with too many amino acids for proper LC-MS measurements, or multiple N-glycosylation sites within one N-glycopeptide.

Materials and Methods
Chemicals. The proteins bovine serum albumin (BSA; A3912-100G) and lactotransferrin from human milk (hLTF; L4894-5MG) were purchased from Sigma-Aldrich. Enzymes used for digestion were trypsin (Trypsin Sequencing Grade Modified; V5111) from Promega and endoproteinase AspN (AspN; P8104S) from New England Biolabs (Flavastacin, purified host cell protein, Uniprot Q47899; see the corresponding SDS-gel for purity of AspN (Flavastacin) in Supplementary Figure 1). All solvents for LC were MS grade. All buffer and solutions were prepared with deionized and purified water (dH 2 O) using a Milli-Q water purification system (18.2 MΩ · cm −1 at 25°C, total organic carbon of 3 ppb) from Merck Millipore. For LC-MS solvents, water was further purified using the LC-Pak Polisher from Merck Millipore. of each protein were applied to a filter unit (Nanosep ® Omega ™ with polyethersulfone membrane, molecular weight cut-off 10 kDa; PALL Life Sciences). Samples were treated with urea buffer (Tris-HCl) (8 M urea in 0.1 M Tris-HCl (aq) pH 8.5; AppliChem), followed by reduction with DL-dithiothreitol (40 mM DTT, Sigma-Aldrich), and alkylation with iodoacetamide (55 mM IAA, Sigma-Aldrich) -each dissolved in 50 mM ammonium bicarbonate (aq) (ABC buffer (aq) , Sigma-Aldrich). Each filter unit was washed three times with urea buffer (Tris-HCl) and three times with ABC buffer (aq) . Proteins were digested proteolytically with trypsin using an enzyme/protein ratio of 1:30 (w/w). Samples were incubated overnight at 37°C and 350 rpm using a temperature controlled incubator (Titramax 1000 + Inkubator 1000, Heidolph). Digests were collected by centrifugation. Filter units were washed twice, first using 50 µL ABC buffer (aq) with 5% (v/v) acetonitrile (ACN), then using 50 µL dH 2 O; in between samples were centrifuged. The flow through was kept along with the digest, in order to be dried by vacuum centrifugation.
After tryptic digestion, approximately 20 µg peptides were reconstituted in 20 µl 1x AspN reaction buffer (New England Biolabs; 50 mM Tris-HCL, 2.5 mM Zinc Sulfate, pH 8.0). Afterwards AspN (Flavastacin) was added (enzyme/protein ratio 1:20) to the peptide solution and incubated overnight at 37°C as recommended by the supplier. The enzyme reaction was stopped via centrifugation trough a filter unit (same as described above). The flow through (tryptic digests of BSA and hLTF, as well as the sequential digests of BSA and hLTF with trypsin and AspN (Flavastacin) was dried by vacuum centrifugation and reconstituted in 0.1% (v/v) trifluoroacetic acid (aq) (TFA; Thermo Fisher Scientific) prior to LC-MS(/MS) measurements.
The eluting peptides were measured on an LTQ Orbitrap Elite mass spectrometer from Thermo Fisher Scientific using a Nanospray Flex TM source in positive ionization mode with a capillary voltage of −2.7 kV. Peptides and glycopeptides were fragmented using HCD with normalized collision energy of 35 with an activation time of 0.1 ms. The five most intense precursor ions with a charge state >1 were chosen for fragmentation. The recorded mass range for MS was 350-2000 m/z and for MS/MS 150-2000 m/z. Data Analysis. The complete MS(/MS) data for hLTF (Trypsin + AspN (Flavastacin)) were analyzed manually using Xcalibur (Version 2.2, Qual Browser, Thermo Fisher Scientific). The first step was the recognition of glycopeptide related fragment ion spectra due specific B-ions "oxonium ions". Afterwards, the peptide mass was presumed by a specific fragmentation pattern: [peptide -NH 3  furthermore in-silico fragmented using MS-Product (free online tool: prospector.ucsf.edu/prospector/cgi-bin/ msform.cgi?form = msproduct). The b-and y-ions were compared with the corresponding ions in the MS/MS fragment ion spectrum with a maximum mass tolerance of 0.02 Da to validate the presumed peptide sequence. The mass difference between the peptide mass and the precursor mass was used to predict the N-glycan composition using ExPASy GlycoMod (free online tool: web.expasy.org/glycomod). The MS(/MS) data from BSA and hLTF were imported into Proteome Discoverer (Version 1.4, Thermo Fisher Scientific) and searched against UniProt-KB/SwissProt database (542258 sequences; downloaded January, 2014) using MASCOT (Version 2.5, Matrix Science). The MS(/MS) data were screened against an unspecific in-silico digestion of the mammalian taxonomy database with the fixed modification of cysteine with carbamidomethyl, variable deamidation of asparagine, and variable oxidation of methionine. The precursor ion mass tolerance was set to 5 ppm and the fragment ion mass tolerance to 0.02 Da. The protein relevance threshold was set to 20 and the peptide cut off score to 10. The target false discovery rate for peptide hits was set to 0.01 (strict setting, and to 0.05 as relaxed setting).
Data availability statement. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Results and Discussion
Every N-glycoproteomic analysis workflow consists of numerous parameters to be optimally adjusted. In particular, the design of proteolytic digestion using sequential digestion steps with a selection of specific enzymes is an important step to overcome common problems such as too large N-glycopeptides (with low charge density and/ or sequence constraints) and N-glycopeptides with multiple glycosylation sites. Here, we present a new approach for the proteolytic digest of glycoproteins by using flavastacin, a protease that we found to cleave specifically at the C-terminus of N-glycosylated asparagine. The glycoprotein hLTF and the non-glycosylated protein BSA were used as model proteins to demonstrate this newly identified cleavage specificity. According to the manufacturer's recommendation, flavastacin works only on peptides smaller than 50 amino acids. Therefore, hLTF and BSA were first treated with trypsin before flavastacin was added (see Materials and Methods).
The hLTF is a well-characterized glycoprotein present in human milk, saliva, tears, nasal secretions and other body fluids 18 that contains three potential N-glycosylation sites (N 156 , N 497 , and N 642 ). The sites N 156 and  Table 1 are marked using asterisks. N-glycoproteomic analysis of flavastacin-generated hLTF N-glycopeptides revealed that all detected N-glycopeptides feature the N-glycosylated asparagine at the C-terminus, while the N-terminus was either a tryptic or an unspecific cleavage site. In addition, the generated N-glycopeptides were shorter compared to a solely tryptic digest. The identified N-glycopeptides for the N-glycosylation sites N 156 and N 497 are 4-16, and 4-13 amino acids long, respectively (see Table 1). In agreement with literature, no N-glycopeptide was detected for the N-glycosylation site N 642 19 . We did not perform any further enrichment of glycopeptides prior to MS analysis since the digestion strategy together with a long separation gradient and a high-resolution mass spectrometry measurement resulted in a comprehensive coverage of glycopeptides as shown in Fig. 2 (for solely tryptic digest see Supplementary Figure 2 Fig. 3A). The example for the fragment ion spectrum of site N 497 shows a neutral loss of 105 Da, related to a carbamidomethyl-methionine residue (see Fig. 3A). This neutral loss from a carbamidomethylated methionine has been described only rarely in literature 21 . Without the awareness of a carbamidomethylation of methionine, only a neutral loss of 48 Da would be recognized, which can mistakenly also be interpreted as a side chain loss of methionine sulfoxide. However, due to the specific digestion strategy involving both trypsin and flavastacin, in combination with high-resolution LC-MS, the identification of such unlikely modifications is also possible.  (Fig. 3). MS/MS spectra of additional N-terminal unspecifically cleaved N-glycopeptides are shown in Supplementary Figures 3 and 4).
Analysis of flavastacin-generated BSA peptides revealed primarily tryptic cleavage at the C-terminus (34 of 37 peptides), and tryptic and unspecific cleavage at the N-terminus (13 tryptic, 24 unspecific cleavages). Twelve peptides had tryptic cleavages at both termini (see Table 2). For the non-glycosylated BSA no flavastacin-generated peptides were detected with an asparagine at the C-terminus, which correlates to observations we made for non-glycosylated asparagines of the N-glycosylated hLTF (see Supplementary Table 1). To check for possible unspecific cleavages of the tryptic digest and its influence on the flavastacin digest, identified peptides of solely tryptically digested BSA (see Supplementary Table 2) and hLTF (see Supplementary Table 3) were examined. Here, almost exclusively, specific cleavages were identified in the tryptic digests of BSA and hLTF. This strongly suggests that the observed N-glyco-specific cleavage of hLTF, as well as the unspecific cleavage of BSA and hLTF of the combined digest (trypsin and flavastacin) can only be linked to the activity of flavastacin.
Whilst it has been described that flavastacin has specificities towards the N-terminus of aspartic acid, glutamic acid and cysteine, we found a unique cleavage specificity of flavastacin for the C-terminus of N-glycosylated asparagine, which was not explored up to now. All manually annotated hLTF N-glycopeptide related peptide sequences are listed in Table 1 (other identified non-glycosylated peptides are listed in Supplementary Table 1). Every single hLTF N-glycopeptide sequence has been cleaved at the N-glycosylated asparagine at the C-terminus . Specificity of flavastacin for the C-terminus of N-glycosylated asparagine. The amino acid sequence of the tryptic N-glycopeptide of hLTF with the N-glycan Hex 5 HexNAc 4 dHex 1 NeuAc 1 linked to N-glycosylation site N156 is shown. The scissors symbolize trypsin; the shield stands for cleavage inhibition of trypsin due to proline (P). Flavastacin is symbolized by a pick, and its specific C-terminal cleavage of N-glycosylated asparagine, as well as its unspecific N-terminal cleavages are shown. The N-glycan structure is illustrated according to CFG nomenclature 22 .
Scientific REPORTs | 7: 11419 | DOI:10.1038/s41598-017-11668-1 -independent of the N-glycoform attached to the respective N-glycosylation site. In addition, we observed that the N-terminus is a tryptic or an unspecific cleavage site. Based on the manually annotated peptide sequences and the database-assisted MASCOT search, no strict N-terminal cleavage of aspartic acid could be observed, neither for BSA nor for hLTF (Tables 1 and 2, Supplementary Table 1).

Conclusion and Outlook
Flavastacin shows a clear specificity for the C-terminus of N-glycosylated asparagine N 156 and N 497 in hLTF (illustrated in Fig. 4). Due to the presence of multiple N-glycosylation sites and the well-described complex-type N-glycan structures, hLTF is a very suitable glycoprotein to demonstrate this newly found specificity of flavastacin.
In contrast to previous work 15, 16 , we could not verify the claimed specificity of flavastacin for the N-terminus of aspartic acid, neither for hLTF nor for BSA. However, we could demonstrate that the sequential combination of trypsin and flavastacin for protein digestion successfully cleaves the N-glycoprotein hLTF in well annotatable N-glycopeptide sequences. Interestingly, in contrast to unspecific digestion strategies using proteinase K or pronase, flavastacin works as an "N-glyco-specific" proteolytic enzyme (specific for N-glycosylated asparagine at the C-terminus). This property improves data quality as well as data analysis and therefore facilitates N-glycoproteomics significantly. However, the unspecific cleavage due to flavastacin at the N-terminus results in the distribution of redundant N-glycopeptide signals with peptide moieties of different length.
Overall, this finding improves the glycoproteomic toolbox and helps to overcome common problems in N-glycoproteomics, i.e. the presence of too large N-glycopeptides with too many amino acids and/or too many N-glycosylation sites for proper LC-MS analysis. Despite the fact that this specificity of flavastacin and its cleaving mechanism need to be examined also for other glycoproteins (as well as for more complex (glyco-) protein mixtures), and in particular for other types of glycosylation (like high-mannose-type, hybrid-type and O-glycosylation), the use of flavastacin will already be beneficial for glycoscience now, as it allows researchers to dig faster and deeper into N-glycoproteomes.