Cross Reactive Material 197 glycoconjugate vaccines contain privileged conjugation sites

Production of glycoconjugate vaccines involves the chemical conjugation of glycans to an immunogenic carrier protein such as Cross-Reactive-Material-197 (CRM197). Instead of using glycans from natural sources recent vaccine development has been focusing on the use of synthetically defined minimal epitopes. While the glycan is structurally defined, the attachment sites on the protein are not. Fully characterized conjugates and batch-to-batch comparisons are the key to eventually create completely defined conjugates. A variety of glycoconjugates consisting of CRM197 and synthetic oligosaccharide epitopes was characterised using mass spectrometry techniques. The primary structure was assessed by combining intact protein MALDI-TOF-MS, LC-MALDI-TOF-MS middle-down and LC-ESI-MS bottom-up approaches. The middle-down approach on CNBr cleaved glycopeptides provided almost complete sequence coverage, facilitating rapid batch-to-batch comparisons, resolving glycan loading and identification of side products. Regions close to the N- and C-termini were most efficiently conjugated.

Scientific RepoRts | 6:20488 | DOI: 10.1038/srep20488 immunogenic glyco-epitopes are structurally well-defined and homogenous 14 . The synthetic approach enables the placement of a spacer carrying a unique functional group such as an amine to facilitate protein attachment at a single site without further activation.
Typically, glycans are coupled to the primary amine side chains of lysine residues and to the protein N-terminus via a linker molecule. Alternatively, sulfhydryl groups in cysteine or carboxyl side chains can serve as sites for attachment. Prior to coupling, non-synthetic glycans need to be activated 15 either arbitrarily on hydroxyl or carboxyl groups 16 , leading to attachment at various points or via a single amine group, introduced by reductive amination, at the reducing end 17 . The choice of the linker is crucial since it influences vaccine properties 18 and may be immunogenic itself 19 . In addition, coupling efficiencies have to be considered. Conjugation of a synthetic glyco-epitope to a carrier protein such as CRM 197 that contains 40 primary amine side chains will result in heterogeneous glycoconjugates since some amino acid side chains more readily engage in conjugation reactions than others 20 . Since variations in epitope loading influence the effectiveness of vaccines 21,22 understanding which side chains preferentially react with glycans represents an important step towards producing better defined, more effective and safer conjugate vaccines. Quality control and batch-to-batch comparison relies on a combination of physicochemical, spectroscopic and spectrometric methods such as nuclear magnetic resonance (NMR) or mass spectrometry (MS) [23][24][25] . Glycan attachment to the carrier protein is typically monitored by SDS-PAGE and MALDI-TOF-MS 26 . These methods provide information about average loading and cannot detect any side reactions or alterations occurring on the glycan epitopes. New methods are necessary for quality assurance of conjugate vaccines and to define conjugation sites on the protein.
We established a mass spectrometry-based assay for the in-depth characterisation of glycoconjugates obtained by coupling synthetic glycans to CRM 197 (Fig. 1). An integrated new middle-down LC-MALDI-TOF-MS approach simplified batch-to-batch comparison, and enabled relative quantification of regional conjugation efficacy. In this study we evaluated the influence of glycan size and amino acid microenvironment on conjugation efficiencies, assessed glycan integrity and determined which amino acid sites were preferably conjugated. Two regions close to the N-and C-termini of the protein engage primarily in the chemical conjugation process and exhibited the highest loading ratios. In addition to accessibility, our data clearly illustrates that the local amino acid environment significantly influences which lysine residues will be modified.

Results and Discussion
Protease based bottom up strategies for the evaluation of CRM 197 glycoconjugate vaccine candidates. Assuring glycan integrity of the glycoconjugate is important after the chemical conjugation step and to evaluate the storage stability. We studied this property by a classical (glyco)proteomics bottom up approach, which provided qualitative data on individual, repeatedly conjugated sites in CRM 197 . Step 1: The glycan part is synthesised with a spacer carrying an amine group.
Step 2: addition of the linker molecule with two leaving groups (LG).
Step 3: addition of the glycan linker construct to the carrier protein and subsequent conjugation to a primary amine (e.g. lysine residue). For unmodified CRM 197 the use of trypsin resulted in an average sequence coverage of 50-67%. Using Glu-C usually 50% of the entire sequence could be identified (see Supplementary Fig. S2). The highest coverage (≈ 70-85%) was obtained using a sequential digestion employing Glu-C followed by trypsin or Asp-N followed by Glu-C. Nevertheless, the results varied considerably from digest to digest indicating the limitation of such a bottom-up approach for routine analyses of chemically glycosylated neoglycoproteins.
Trypsin did not cleave at modified lysine residues resulting in conjugated peptides carrying at least one missed cleavage site 27 . As a result, the C-terminal lysine in any tryptic peptide had to be unmodified in order to be cleaved. This facilitated the assignment of the actual conjugation site in particular as for most glycopeptides only one lysine residue remained as a possible site of conjugation. On the other hand, the modification induced missed cleavage sites lead to an increased heterogeneity of peptide fragments and consequently reduced signal intensities.
The problem could be avoided by the use of alternative proteases not cleaving at lysine residues (Glu-C, Asp-N). However, these proteases produced peptides with multiple lysine residues. In order to elucidate the conjugation sites it was crucial to acquire sufficient sequence information from the MS/MS experiment. Besides the qualitative data an in-depth conjugation site analysis also required quantitative information on site occupancy. The protease assisted bottom-up approach, however, primarily provided qualitative data since a direct mass spectrometric quantitative comparison between different peptide backbones is impossible. Despite the fact that in some cases the conjugation reactivity of individual lysine residues located on the same peptide could be determined (Fig. 2D), a protease based approach appeared to be unsuitable for any broad quantitative comparison of region/site specific conjugation efficiencies between different conjugate vaccines.
CRM 197 neoglycopeptides exhibit unusual CID and ETD fragmentation patterns. Naturally occurring glycopeptides usually exhibit strong and specific oxonium ions when subjected to collision induced dissociation (CID) 28,29 . This feature is frequently applied to quickly filter glycopeptide spectra from the bulk of MS/MS data. However, CRM 197 neoglycopeptides carrying the glyco-epitope constructs used in this study (Table 1) produced a different fragmentation pattern which exhibited an extremely prominent Y-ion series when subjected to CID while showing almost no or very low intensity oxonium ions (see Fig. 2B and Supplementary  Fig. S3). Assuming that the majority of protons are associated with the peptide rather than with the glycan moiety, it appears that the linker construct prevents the protons from effectively migrating to the glycan fragments,  resulting mostly in the detection of neutral loss fragments. This observed phenomenon appeared to be independent of the length of the carbon spacer. Despite the lack of oxonium ions the observed Y-ions showed a characteristic serial neutral loss of each monosaccharide in the glycan chain. This feature was used to distinguish MS/MS spectra of conjugated peptides from unconjugated ones. CID analyses also enabled confirmation of the glyco-epitope integrity after conjugation to the protein.
Electron transfer dissociation (ETD) of the peptide backbone 30 enabled the acquisition of sufficient data to confirm peptide identities and map the site(s) of conjugation on the majority of the detected glycopeptides ( Fig. 2C). Nevertheless, the ETD MS/MS spectra derived from the CRM 197 neoglycopeptides showed unusual fragmentation patterns. Z-ions were highly underrepresented, especially when obtained from tryptic glycopeptides. The c-ions series continued at most until the amino acid neighbouring the conjugated lysine, and z-ions containing this residue could generally not be detected.

Glyco-Epitope screening by nanoLC-ESI-MS/MS. Liquid chromatography enabled separation of iso-
baric modified peptides carrying the conjugate at different lysine residues. In addition to the glycoconjugates we detected conjugates of adipic acid formed by residual linker reacting with the lysine residues of CRM 197 . This is a common problem since the di-succinimido-adipate (DSA) linker hydrolyses very easily hampering an extensive purification. These adipic acid conjugation artefacts are referred to as "free linker" in this study. Addition of the glycan construct as well as of the free linker resulted in an increased retention time with the glycan-linker construct showing the biggest impact (Fig. 2D).
Minor unintended structural features or contaminations might not always be picked up prior to conjugation. The multiple dimensions of nano LC-ESI-MS/MS analysis detected low abundant features such as a peptide carrying a minor side product of the chemical synthesis of ST3. CID fragmentation revealed that one of the two glucuronic acids of the tetrasaccharide was substituted by a glucose resulting in a mass shift of − 14 Da from the expected value (Fig. 3). This technique also allows detection and monitoring of any mass modification due to degradation.
Overall, the bottom up approach provided useful data to confirm conjugate integrity, to enable conjugation site mapping and to monitor site-specific conjugation preferences. Nevertheless, issues such as incomplete sequence coverage or the uncontrollable heterogeneity of proteolytic (glyco)peptide products 31 made it less suited for ensuring batch-to-batch consistency of chemically glycosylated CRM 197 in a pharmaceutical context. Therefore, an orthogonal LC-MALDI-TOF MS based approach was developed with the intention to overcome these shortcomings and enable fast batch-to-batch comparisons while still providing more detailed data on glycosylation occupancy compared to intact protein analyses. Improved CRM 197 -glycoconjugate sequence coverage using chemical cleavage combined with LC-MALDI-TOF-MS. We developed and applied an LC-MALDI-TOF-MS middle-down proteomics approach based on CNBr mediated cleavage to increase sequence coverage and simplify batch-to-batch comparisons while achieving a better general overview on the entire protein sequence detecting any intended and unintended modifications. CNBr mediated protein cleavage results in larger glycopeptides that should compensate or significantly reduce ionisation suppression effects. Glycosylation was shown to reduce ionisation efficiency of tryptic N-glycopeptides, however ionisation efficiency appeared to increase when the peptide to glycan mass ratio was shifted towards the peptides 32 .
CNBr cleavage occurs on the C-terminal side of methionine residues 33 . In the case of CRM 197 CNBr cleavage results in nine distinct (glyco)peptides for subsequent LC-MALDI-TOF-MS analysis (see Supplementary  Table S4). Eight out of nine peptides could be reproducibly detected, corresponding to a sequence coverage of > 99% (Fig. 4). The missing four amino acid peptide ( 179 YEYM 182 ) exhibited a mass < 600 Da, which is below the optimal operation range of MALDI-TOF-MS. This particular peptide is one of two that do not carry any potential conjugation sites such as lysine residues or the protein's N-terminus, ultimately leaving only seven peptides that needed to be considered in a simplified and comprehensive batch-to-batch comparison.
Due to the large size of the peptides C8-as well as C4-reversed phase LC were tested for (glyco)peptide separation. The C4-chromatography provided better peak shapes and separation reproducibility and was therefore used for the majority of analyses. As expected, the most intense m/z signals corresponded to singly charged ions while larger peptides (m/z > 5000 Da) were detected as minor signals of doubly charged species as well (Fig 4).
Two batches of ST3 conjugate were analysed in a first run. Depending on their size, the peptides usually eluted during approx. 1-2 minutes. ST3 conjugated glycopeptides were detected by their specific mass shift of + 847.80 Da, corresponding to the average mass of one glycoconjugate unit. A mass shift of + 128.06 Da indicated the presence of free linker.
The presence of conjugated adipic acid on a CNBr peptide resulted in a retention time shift of around + 30 seconds whereas the presence of a glycoconjugate barely had any effect. In contrast, in the bottom up approach the separation of (glyco)peptides using C18 chromatography, free linker and especially glycoconjugates showed a much larger influence on retention time (Fig. 5). The survey view of the WARP-LC software allows for a simultaneous assay like representation of retention time, m/z and intensity. This provided a clear overview of the conjugation status of each CNBr cleaved (glyco)peptide. Increasing amounts of glycoconjugates as well as the addition of several adipic acid units could be easily depicted as characteristic peak patterns (Fig. 5).

SurveyViewer assisted comparison of batch-to-batch variances. Analyses of several batches of
conjugates demonstrated that differences in loading and the extent of side reactions could be quickly detected in a protein region specific manner and evaluated using this approach. Major differences in free linker loading and consequently in glycan loading could be observed between the different conjugates (Fig. 5). Batch B showed one additional glycan conjugation in each of the peptides whereas batch A exhibited an increased free linker loading. More detailed information was obtained when relative quantification of the respective (glyco)peptides was applied (Fig. 6). The degree of conjugation with the free linker differed considerably in the two batches. Batch A generally displayed an elevated loading. This increased occupation of lysine residues with free linker molecules resulted in a reduced number of available sites for glycan conjugation, explaining the observed lower glycan loading.
To address the issue of adipic acid conjugation onto the protein and further benchmark the assay a new set of CRM 197 glycoconjugate was prepared using the p-nitrophenol (PNP) activation method 34 with GLC as a test epitope (Table 1, Fig. 5). Compared to DSA, the PNP-linker is less prone to hydrolysis allowing an extensive purification after activation. The analysis showed that glycan loading was generally lower, however only conjugated glycans and no free linker molecules were detected in both batches. Our findings demonstrate the potential of the assay to reliably detect and differentiate intended and unintended modifications occurring on CRM 197 .

Evaluation of CNBr cleavage conditions on glycan conjugate integrity. CNBr cleavage requires
highly acidic conditions (50% TFA). Even though the sample preparation is performed at low temperature (4 °C) the acidity could potentially affect glycan integrity. Therefore, we evaluated the stability of the glycan epitopes used in this study ( Table 1). As exemplified on the ST3, PS1 and GLC-CRM 197 conjugates no degradation products could be detected (Fig. 5). Our data indicate that most glycan structures appear to tolerate the applied cleavage conditions. Conjugation site identification on CNBr glycopeptides using ISD. To determine site-specific attachment information from the conjugated glycopeptides we tested fragmentation by in source dissociation  (ISD) using this LC-MALDI-TOF-MS approach. This technique has been frequently used for N-and C-terminal protein sequencing of medium size and larger proteins [35][36][37] , and in the case of smaller proteins even top-down de novo sequencing of the entire protein was achieved 38 .
In order to assign the conjugated glycan to a specific lysine residue, the unconjugated peptide should ideally be LC baseline separated from the different conjugated forms. As mentioned previously in the text just minor retention time differences were observed and the necessary separation required for ISD was not achieved when using the C4 column. In contrast, C8 chromatography resulted in sufficient separation in the case of two singly conjugated (glyco)peptides. ISD analysis of these compounds allowed sequence confirmation of the individual peptides and assignment of the modification to the respective amino acid (Fig. 7).
Despite the LC separation it must be assumed that the selected m/z signals still contained a mixture of conjugation site isomers. Therefore, fragment ions specific for less abundant isomers are likely suppressed by the most intense glycopeptide isomer(s), providing information on the site(s) most efficiently modified within a glycopeptide. In peptide 231-314, K242 was identified as the most prominent conjugation site for both, conjugations with the free linker as well as with the glycan. A similar observation was made for K212 on peptide 183-230 (see Supplementary Fig. S5), indicating that these lysine residues are more frequently conjugated. These results also correlated well with the data obtained for K242 within peptide 242-249 after digestion with Glu-C (Fig. 2D) and demonstrate the principal applicability of ISD fragmentation to obtain in-depth attachment site information on chemically conjugated glycopeptides.

Evaluation of conjugation site occupancy.
Even though CRM 197 contains 39 lysine residues and the N-terminal amine that can serve as sites of conjugation, not all of them seem to be equally reactive. Steric accessibility, the individual pKa of the respective lysine residues as well as reaction conditions and conjugate size are likely to influence the conjugation efficacy of individual sites. To evaluate whether particular lysine residues represent favoured sites of conjugation, various CRM 197 conjugates (ST3, PS1, LPG and GLC, Table 1) were digested either with trypsin or orthogonal proteases and protease combinations. Thereby, a qualitative conjugation site map of all modified primary amines was established based on the presence of conjugation sites in the different samples (Fig. 8).
Three areas within CRM 197 seem to be preferred for the attachment of glycan epitopes: In addition to lysines K37, K103, K104 and K125, a region including residues K212, K221, K227, K236 and K242 as well as the C-terminal part (in particular K498 and K526) were repeatedly found to be conjugated (Fig. 8). Some peptides such as the peptide 519 DHTKVNSKLSLFFE 532 containing K526 could only be detected using certain protease combinations. In terms of attachment sites the results are in good agreement with those from Crotti et al. 20 , who evaluated site reactivity based on the presence of a linker molecule without the glycan. In silico mapping of the lysine residues on a CRM 197 structure model 39 showed that most conjugation sites identified in the course of this work are located on outer edges (Fig. 8). The conjugation of mono-to pentasaccharides as performed in this study as well as the conjugation of just a linker molecule as described by Crotti and colleagues did not show marked differences in conjugation site selectivity. Based on these data we conclude that smaller variations in conjugation efficiency observed within the different epitopes are in majority due to unavoidable batch-to-batch variations resulting from the chemical glycosylation protocol rather than induced by structural differences of the epitopes.
In contrast, the microenvironment of the conjugation site apparently influenced the conjugation efficiency significantly, as demonstrated in particular by glycopeptide 242 KAKQYLEE 249 obtained after Glu-C-digestion. The two lysine residues K242 and K244 are located within an alpha helical part in close proximity to each other but differed significantly in their respective conjugation reactivity. An exclusive steric effect is unlikely to account for the reactivity differences, however variations in the regional pKa of the respective lysine residues might be involved. Studies on helix stabilisation by glutamic acid and lysine residues showed that these amino acids, when spaced three to four amino acids apart, were efficiently stabilizing alpha helices by the formation of salt bridges 40 . Unlike K242, K244 is located in sufficient proximity to three glutamic acid residues (E240, E241, E248, Fig. S6) and could be involved in forming a salt bridge, stabilizing the positive charge of K244 and hence decreasing its nucleophilicity, rendering K244 less reactive.

Semi-quantitative regio-specific occupancy determination by LC-MALDI-TOF MS. The signal
intensities of the unconjugated peptide and all of its modified forms detected in the LC-MALDI-TOF MS experiments were summed up and the relative amounts for each peptide/glycopeptide group were determined. In the case of ST3 and PS1, peptides 15-115 and 460-535 were most efficiently conjugated, followed by peptides 183-230 and 231-314 (Fig. 9A). Peptides 1-14, 116-178 and 340-469 were found to be least efficiently conjugated. A similar result was obtained for GLC-CRM 197 conjugates that were produced via the PNP method, where no free linker interfered with the conjugation. Again, peptides 15-115 and 460-535 showed the lowest amount of Lysine residues are labelled blue. Lysine residues that were frequently found conjugated with a glycan in all the samples are labelled in red. unconjugated peptide and the highest conjugate loading (Fig. 9B). Apart from peptide 1-15, the LC-MALDI-MS results for the least conjugated region are qualitatively correlating with the LC-ESI-MS data.
The data acquired using various orthogonal approaches in the course of this study allowed us to semi-quantitatively evaluate conjugation efficiency on the individual lysine residues. We concluded that amino acid residues K103, K498, and K526 were among the most reactive ones in the peptides 15-115 and 460-535, since these peptides showed the highest overall conjugation. In the slightly less reactive peptides 183-230 and 231-314 amino acids K212, K221, K242 and K236 were among the frequently conjugated sites.

Conclusions
Characterisation of glycoconjugate vaccines requires sophisticated and orthogonal approaches. In the course of this study we applied well-established glycoproteomic techniques and developed a novel approach for the in-depth characterisation of the widely used immunogenic carrier protein CRM 197 that was modified with a variety of defined synthetic oligosaccharide epitopes. Chemical cleavage using CNBr followed by a "middle-down" LC-MALDI-ISD detection strategy provided several advantages towards any protease based assays for comprehensive and in-depth semi-quantitative evaluation of region specific conjugation efficiency for CRM 197 glycoconjugate vaccine candidates. This technique largely facilitated conjugation product screening by providing virtually complete sequence coverage, good overview on intended as well as unintended modifications and allowed a reproducible, protease independent opportunity for batch-to-batch evaluations. From the 40 possible primary amines (39 lysines + N-terminus) present in CRM 197 , just a few participate in quantitatively relevant conjugation reactions. Steric accessibility, the local amino acid environment and protein secondary structure are likely the most relevant parameters influencing the conjugation reaction under non-denaturing conditions. Our findings can ensure quality control of effective and safe vaccines as well as improving rational vaccine design in the future. It remains to be seen how conjugation variations on these specific sites possibly influence CRM 197 conjugate vaccines effectiveness, nevertheless the presented assay provides an important step towards functional studies applying CRM 197 glycoconjugate vaccine candidates carrying defined synthetic glyco-epitopes.

Material & Methods
Reagents. Sequencing grade trypsin was purchased from Roche (Mannheim, Germany), Glu-C and Asp-N from Protea Bioscience (Morgantown, WV). Acetonitrile (ACN), formic acid (FA), trifluoroacetic acid (TFA), cyanogen bromide (CNBr), iodoacetamide (IAA), dithiothreitol (DTT), ammonium bicarbonate and triethylamine amine were purchased from Sigma Aldrich (Munich, Germany). Recombinant CRM 197 was obtained from Pfenex (San Diego, CA). The BCA assay was purchased from Pierce (Rockford, IL), C 18 ZipTips ® and Cetricon ® diafilters were purchased from Millipore (Tullagreen, Irland) and hydrophilic lipophilic balanced solid phase extraction cartridges 30 MG (HLB SPE) were obtained from Supelco/Sigma Aldrich (Bellefonte, PA). Generally, the MIAPE and MIRAGE reporting guidelines are followed throughout this work 41,42 . Glycan epitope synthesis. The glycan epitopes ST3, PS1 and LPG (Table 1) were synthesised from monosaccharide building blocks as described previously [43][44][45] . Tris-HCl, 40% glycerine, 4% SDS, 0.015% Bromophenol blue containing 50 mM DTT) and denatured at 96 °C for 5 minutes before 0.5 μ L of 500 mM IAA were added to achieve a final concentration of 50 mM IAA. The solution was incubated 30 min in the dark before loading the protein onto a 10% SDS-PAGE gel for electrophoretic separation. Proteolytic digestion and peptide extraction were performed as described earlier 28 . Briefly, excised bands were cut into pieces destained in 50% ACN and dried. Proteases (Trypsin, Glu-C, Asp-N) were used in a protease to protein ratio of 1:50 respectively in the single digests or successively in double digests (Tryp/Glu-C; Glu-C/Asp-N). After peptide extraction with 2x 50% ACN and 2x 5% FA the samples were dried in a Speed Vac concentrator and reconstituted in 0.1% FA for further MS analysis.
10 μ g (30 μ g for larger amounts) of protein were dissolved in 190 μ L 50% TFA. Subsequently, 10 μ L of the CNBr stock solution were added. The solution was topped with N 2 , sealed with Parafilm and incubated for 48 hours at 4 °C in the dark. The reaction was quenched by the addition of 1 mL of water with subsequent removal of the liquid by Speed Vac drying. The sample was reconstituted in 10 μ L 0.1% FA and purified by C 18 ZipTips eluting in 10 μ L 80% FA. Larger amounts of initial protein were reconstituted in 50 μ L and purified by HLB SPE. The sample was washed three times with 400 μ L of 0.1% FA and peptides were eluted using three times 200 μ L 80% ACN containing 0.1% FA. The sample was dried and reconstituted in 20 μ L of 0.1% TFA for LC-MALDI-TOF-MS analysis.

LC-ESI-MS/MS.
NanoLC-ESI-MS analysis was carried out on an Ultimate 3000 RSLC-nano system (Dionex/ Thermo Scientific, Sunnyvale, CA) coupled to an amaZon speed ETD ion trap (Bruker, Bremen, Germany). In each run peptides corresponding to 0.15 μ g of a CRM 197 digest were injected. The peptides were concentrated on a C18 precolumn (Acclaim PepMap100 ™ , Thermo, 100 μ m × 20 mm, 5 μ m particle size) and separated by reversed phase chromatography on a C18 analytical column (Acclaim PepMAp ™ , Thermo, 75 μ m × 15 cm, particle size 3 μ m). The samples were loaded in 99% buffer A (0.1% FA) for 6 min on the precolumn at a flow rate of 6 μ L/min before the captured peptides were subjected to nanoLC at a flowrate of 350 nL/min using a gradient of increasing ACN as follows: buffer B (ACN containing 0.1% FA) from 1% to 20% (7-57 min) followed by a further increase to 50% (57-75 min) and a steep increase to 90% (75-86 min) before returning to the starting conditions. The mass spectrometer was set-up to perform CID as well as ETD fragmentation on the three most intense signals in every MS scan. An m/z range from 380-1800 Da was used for data dependent precursor scanning. The MS data was recorded using the instrument's "enhanced resolution mode". MS/MS data was acquired in "ultra-mode" over an m/z range from 100-2000. Details on the MS CID and ETD settings can be found as Supplementary Table S1. Bruker, Bremen, Germany). Super dihydroxybenzoic acid (sDHB) was used as matrix. Aliquots of 0.5 μ L of a 50 mg/mL matrix solution in 50% ACN/water/0.1% TFA were spotted to each dried fraction.

LC-MALDI
The fractions were then analysed in an automated way by WARP-LC 1.3 and Compass 1.4 in positive linear ion mode and spectra were displayed according to their retention time in the SurveyViewer of the software. In-source decay (ISD) spectra were then manually acquired on selected fractions using a positive reflector method optimised for ISD. Typically, 5000-10000 laser shots were accumulated for each ISD spectrum. External calibration of ISD spectra was performed with ISD fragments of bovine ubiquitin spotted on the calibration chips of the MTP 384 BigAnchor target.

MS Data
Processing. LC-MS data was analysed with Compass 4.1 analysis software (Bruker, Bremen, Germany). Compound spectra were created with a retention time window of 0.5 min and an intensity threshold of 100000. EIC for neutral loss of respective monosaccharide glycan masses were screened for conjugated peptides as well as all compound spectra were checked manually for presence of conjugated peptides. Protein coverage was evaluated using ProteinScape 3.1 (Bruker, Bremen, Germany) and the MS data was searched against a custom-made database using Mascot server 2.3 including all available entries for CRM and diphtheria toxin. The cleavage sites were defined as non-specific since some unspecific cleavage products were identified.
LC-MS data was analysed and the area under the curve (AUC) was quantified on SurveyViewer (Bruker). For quantitation AUC of the EICs of the respective peptide/glycopeptide signals was integrated. For each peptide the sum of all AUC of the modified forms and the non-modified peptides was set as 100%. The ISD spectra were processed in Compass 4.1 using SNAP as peak picking algorithm for monoisotopic peak annotation and further analysed in BioTools 3.2 SR4 (Bruker). Peptide sequences were modified with the conjugates and matched with the ISD spectra to determine the modification sites. Each annotation was further validated manually. The mapping of the reactive lysine residues in the 3D model was performed using Molecular Operating Environment (MOE), 2014.09 (Chemical Computing Group Inc., 1010 Sherbooke St. West, Montreal, QC, Canada).