Characterization of Conformational Ensembles of Protonated N-glycans in the Gas-Phase

Ion mobility mass spectrometry (IM-MS) is a technique capable of investigating structural changes of biomolecules based on their collision cross section (CCS). Recent advances in IM-MS allow us to separate carbohydrate isomers with subtle conformational differences, but the relationship between CCS and atomic structure remains elusive. Here, we characterize conformational ensembles of gas-phase N-glycans under the electrospray ionization condition using molecular dynamics simulations with enhanced sampling. We show that the separation of CCSs between isomers reflects folding features of N-glycans, which are determined both by chemical compositions and protonation states. Providing a physicochemical basis of CCS for N-glycans helps not only to interpret IM-MS measurements but also to estimate CCSs of complex glycans.

Carbohydrates are ubiquitous components of living organisms. As over 50% of eukaryotic proteins and most of membrane proteins are glycosylated 1 , their roles in biological processes, ranging from protein folding to biomolecular recognition events, have received considerable attention [2][3][4] . The biological activity of a glycoconjugate is often affected by a subtle change in a carbohydrate structure 5 . Carbohydrates have multiple linkage points and take branched structures. In addition, many rotamers arise along flexible linkages. The hydroxyl groups act as both donor and acceptor for hydrogen (H-) bonds, potentially forming a complex network of interactions to stabilize various conformations. Carbohydrates are commonly a complex mixture of various isomers, each having multiple conformations, and analysis of their structures in solution is extremely difficult 6 .
Ion mobility mass spectrometry (IM-MS) is an analytical method increasingly used for obtaining information of conformational changes in polyatomic ions, such as unfolding or aggregation [7][8][9] . This method separates gas-phase ions based on the collision cross section (CCS) that reflects their sizes and shapes. IM-MS is capable of separating stereoisomers with identical mass but different conformations 10 . The method has been increasingly applied for structure determination of carbohydrates [11][12][13][14] . Recently, highly sensitive techniques have been reported for separating diastereomers of synthetic carbohydrates as well as those of glycoconjugates [15][16][17][18][19][20][21] . For instance, epimeric monosaccharides, which differ in chirality of only one hydroxyl group, can be differentiated 17 . However, conformational diversity as well as chemical heterogeneity of carbohydrates raises questions about the interpretation of IM-MS data. First, the experimentally derived CCS represents either a single species or multiple isomers and/or conformers. Second, the ionization of samples (e.g. protonation or sodiation) modifies their conformational ensemble. To date, limited investigations have been undertaken on gas-phase conformations at a finite temperature, while many studies have focused on solution conformations 22 . Thus, the determinants of CCSs of carbohydrates still remain largely unknown.
We previously reported a combined IM-MS and molecular dynamics (MD) simulation study on the identification of isomeric N-glycan structures 23 . Such a combined approach enabled us to link the measured CCS with structures at atomic resolution 7,24,25 . We measured the drift times of ten pyridylamino (PA)-glycans, including isomeric forms, in the presence of N 2 gas by using traveling wave ion mobility mass spectrometry. The PA group was introduced to enhance detection sensitivity. Doubly protonated PA-glycans ([M + 2H] 2+ ) predominated. Then, doubly protonated PA-glycans using the absolute CCS values of doubly protonated polyalanine 36 in N 2 gas, rather than the singly protonated species used before 23 . They are listed in Table 1. Five points (n = 11, 12, 13, 14, and 15) were used for the calibration since the polyalanine with n > 15 becomes triply charged. A good linearity was found for the drift time and CCS of polyalanine as shown in Table 1. We thus consider that the standard curve can be used for the calculation of glycan CCS.
Calculated CCS. The calculated CCS values for the ten glycans in N 2 gas as well as in He gas are listed in Table 2. For both P3 and P6 states, they are plotted against experimental values in Fig. 1c. A single protonation state was considered for G0F-GN(3) (core GlcNAc protonated) and for G0F-GN(6) (see Methods for details). Calculated values coincide with experimental ones, with correlation coefficients of 0.90 (P3 state) and 0.89 (P6 state), and average percentage differences of 7.8% (P3 state) and 7.9% (P6 state), respectively (percentage differences were calculated as CCS calc −CCS exp /CCS exp × 100%). These differences could arise from inaccuracies either in simulations (such as force field parameters) or in experimental calibration. For instance, the use of other calibrants, such as dextran, may improve the accuracy of experimental CCS values. CCS values for the two protonation states of each glycan are different (12 Å 2 on average), the magnitude depending on type (maximum 22 Å 2 for G1F(3) and G2, and minimum 5 Å 2 for G0). The sharp CCS distributions, which were observed experimentally 23 , suggest that dehydration suppresses the dynamics of the sample ion under electrospray ionization conditions. Fig. 2a shows the root mean square fluctuations (RMSFs) of heavy atoms. For most of the glycans, except for G1F(6),  Table 2. Calculated collision cross sections (in Å 2 ) in N 2 and He drift gas for each of two protonation states, P3 and P6, of ten PA-glycans. The experimental values in the presence of N 2 drift gas are also listed. The percentage difference between the experimental and calculated CCS in N 2 drift gas (%diff = CCS calc − CCS exp / CCS exp × 100%) are given in parenthesis. a The core GlcNAc, rather than that of the α1-3 branch, was protonated.

CCS distribution and conformational ensemble.
We examined the calculated CCS distributions of the ten PA-glycans in terms of their conformational ensembles (see Methods for details). We also performed k-means clustering using the MMTSB toolset 37 (threshold: the root mean square deviation (RMSD) = 2.5 Å) and CCSs were calculated for each cluster. The results are summarized in Fig. 3. We found that the CCS distributions are relatively narrow (within the range of 50 Å 2 ) and mostly unimodal. This implies that CCS reflects a few distinct major conformers, and, consistent with this, only one or two clusters have a large population size. Other IM-MS studies on similar N-glycans also reported that a single conformer dominates 29,30 . Different orientation of the core chitobiose, particularly the PA group, gives a bimodal character to the CCS distributions of G1(3) and G0F (Fig. 4). This contrasts with the unimodal character in the experimentally observed arrival times of G1(3) and G0F (Fig. 1b). The discrepancy could arise because either two conformers interconvert in the experimental timescale or there is an inaccuracy of the force field parameters for the PA group. The major conformations of the ten N-glycans are classified into two folded forms described as compact globular and rod-like shapes (Fig. 5). In the compact globular shape, three arms (Manα1-3, Manα1-6, and core chitobiose) symmetrically bend over the central mannose at the branching point. On the other hand, in the rod-like shape, the Manα1-6 arm preferentially undergoes "backfolding" to the  (3)) have compact globular shapes (small CCSs), while other glycans favor rod-like shapes (large CCSs). An earlier study on doubly sodiated N-glycans showed that the backfolding of the α1-6 arm is commonly observed in the lowest energy conformers and folding of α1-3 arm leads to small CCSs 30 . Note that the CCS values of rod-like conformers change monotonically upon increasing the molecular mass by adding GlcNAc or Gal residues, implying that the folding of the α1-3 arm causes an unexpected change in CCS values. Further elaborated modelling of proton positions may be required to resolve the multiple features of glycans more accurately.
Difference in hydrogen bonding pattern. We analyzed H-bond interactions in G1(3) and G1 (6), where one galactose is added to G0 (Fig. 5). We used 2,500 snapshots from the trajectory of each system. The following geometric definition was used to define H-bonds: R XY < 3.5 Å and θ HXY < 30°, where R XY is the distance between the heavy atoms X and Y, and θ HXY is the angle between the X-H bond and the X-Y vectors. The residue-residue H-bond maps calculated for all isomeric pairs are shown in Fig. 6. In G1 (3), which has the compact globular shape, H-bonds are frequently observed between intra-arm proximal residues. In contrast, we found high probabilities of H-bond interactions between distal residue pairs of the α1-6 arm and the core chitobiose, e.g. 2:GlcNAc-5:Gal (82%) and 1:PA-GlcNAc-6:Gal (52%) in G1 (6). In all isomeric glycans, the compact globular shapes have less core-α1-6 H-bonds, while the total number of H-bonds is almost preserved (Table 3). Thus, the separation of CCS values is a result of different H-bonding patterns in the isomers. In the studied N-glycans, the flexible Manα1-6 linkage facilitates backfolding of the α1-6 arm to maximize stabilization via H-bonds. When the α1-3   In PGLN, both N-H and C=O groups could be protonated. In order to determine the protonation site, proton affinities were estimated for both groups using quantum chemical calculations at the B3LYP/6-31 G(d,p) level. The proton affinity was estimated by subtracting the electronic energy of protonated PGLN from that of neutral PGLN. The geometries of the two forms (protonated and neutral PGLN) were optimized both in gas-phase and in aqueous solution. The effect of solvent was taken into account by using the polarized continuum model 42,43 (dielectric constant of 78.35). All calculations were performed using the Gaussian09 program package 44 . The results show that the C=O group has a higher proton affinity than the N-H group both in the gas-phase (219.7 kcal/mol for N-H and 229.6 kcal/ mol for C=O) and in solution (262.3 kcal/mol for N-H and 271.2 kcal/mol for C=O). Therefore, we decided to protonate the C=O group for PGLN. This proton position is supported by the estimated pKa using the Marvin pKa plugin from ChemAxon (Marvin 16.6.27.0 ChemAxon Ltd. http//www.chemaxon.com) (Supplementary Figure S1). The use Figure 6. Residue-residue H-bond maps calculated for three pairs of isomeric PA-glycans with selected protonation states. The geometric definition was used to identify H-bonds: R XY < 3.5 Å and θ HXY < 30°, where R XY is the distance between heavy atoms X and Y, and θ HXY is the angle between X-H bond and X-Y vectors. The regions of inter-branch interactions are highlighted with Greek numbers (I: core chitobiose-α1-6 branch, II: core chitobiose-α1-3 branch, III: α1-6 branch-α1-3 branch).  of such a pKa program helps to theoretically model proton positions that are not known experimentally, and may be followed by more expensive quantum chemical calculations to elaborate the model. Note that we considered a single protonation state for G0F-GN(3), in which the core GlcNAc was protonated. Protonation of GlcNAc of the α1-3 branch in G0F-GN(3) gives a CCS value significantly larger than that of G0F-GN (6). This is inconsistent with experimental evidence 23 , which indicates that the CCS of G0F-GN(3) is smaller than that of G0F-GN (6). In G0F-GN(3), the length of the flexible α1-6 branch is exceptionally short compared to the other N-glycans studied, and this could increase surface exposure of the core GlcNAc and the possibility of protonation. Indeed, protonation of the core GlcNAc gives a much better correlation between calculated and experimental CCSs.
Force field parameterization. The parameters for PGLN and PA were newly developed based on the CHARMM36 carbohydrate 32,45,46 and CHARMM General Force Field 47,48 . In order to preserve consistency with existing force fields, we assigned available parameters to fragments (GlcNAc for PGLN and D-Threitol, Alanine dipeptide, Morpholine, and 2-acetamide pyridine for PA) and determined the partial atomic charges and dihedral angle parameters only for atoms proximal to the added proton as shown in Supplementary Figure S2.
For the charges of PGLN, the CHARMM36 values for GlcNAc were used for the glucose moiety. The charges of the acetamide moiety were derived from the electrostatic potential fit (ESP) charges obtained from the quantum chemistry calculation of acetamide using the Merz-Singh-Kollman scheme 49,50 at the MP2/6-31 G(d) level. Similarly, the CHARMM36 charges for D-threitol and the CHARMM General Force Field charges for the alanine dipeptide, morpholine, and 2-acetamide pyridine were assigned to each of the four fragments for PA. Then, we modified the charges around the added proton (NH 2 + group) according to the ESP charges calculated for PA. Note that we manually modified some values in order to preserve a total charge of +1. The final charges for the two residues are given in Supplementary Table S1.
For dihedral angle parameters, we optimized two missing parameters, χ0 in PA and χ3 in PGLN, to reproduce the ab initio potential energy surfaces (PESs) of the model compounds shown in Supplementary Figure S3. The ab initio PESs were obtained at the MP2/cc-pVTZ//MP2/6-31 G(d) level. The PES was scanned in 30° increments for the corresponding dihedral angles. The parameters were determined by fitting the empirical PES to the entire range of ab initio PES using the fit_dihedral.py script 51 , with the optimized parameters listed in Supplementary  Table S2. The newly determined parameter sets well reproduce the ab initio PESs for the rotations of χ0 and χ3 in the model compounds, as shown in Supplementary Figure S3. Note that the ab initio PESs for the rotations of χ1 and χ2 angles of PGLN were reasonably reproduced using the default parameters, although the added proton could affect these angles significantly.

Molecular dynamics simulations.
The replica-exchange molecular dynamics (REMD) simulations 26 were performed for 18 PA-glycan systems in total: A single protonation state for G0F-GN(3) and G0F-GN(6), and two protonation states for the rest of the eight PA-glycans. The in-house interface program (REIN: Replica-Exchange Interface version 0.1) 35 was used for REMD simulations. The NAMD program 52 was used for each MD simulation inside of REIN. The initial configurations were constructed using the GLYCAM web portal (Glycam Biomolecule Builder, http://glycam.org) 53 . All the simulations were carried out in vacuum. The Langevin dynamics method was used to maintain the temperature (damping constant = 5 ps −1 ). Nonbonded interactions were calculated without any truncation, using a large cutoff distance (50 Å). The bonds involving hydrogen atoms were kept rigid using SHAKE. A time step of 2 fs was used in the simulations. Each system was equilibrated using a constant temperature MD simulation at 300 K before the REMD simulation. The final snapshots were used to initiate the subsequent REMD simulations. Sixteen replicas were used to cover the temperatures ranging from 300 K to 1,058 K. To keep the chair conformation of the pyranose rings at high temperatures, six dihedral restraints were applied to the ring. Each replica was simulated for 50 ns with replica exchange trials at every 2 ps. The total simulation time was 0.8 μs (50 ns × 16 = 800 ns) for each system. Only the trajectory at 300 K was used for analyses. The convergence of simulations was assessed by checking the acceptance ratio, replica exchanges, and the relative population sizes of conformers. In Supplementary Figure S4, we show data obtained for G1F(3) as an example. The acceptance ratio is 28% on average. We confirmed random walks both in the replica space and temperature space. We plotted the relative population size of each conformer having different α1-6 linkage orientations. Each population changes drastically in the first 10 ns and converges reasonably within 50 ns. We obtained similar results for the other glycans.

Calculation of collision cross section (CCS).
CCS values in N 2 drift gas were calculated using the trajectory method 27,33 . We used the modified version of the MOBCAL program and a newly parametrized algorithm of the N 2 -based trajectory method 34 . The structures from REMD simulations were used for CCS calculations without optimization. In the MOBCAL program, we used the same partial charges as for the REMD simulations that we parametrized to reproduce the results of quantum chemical calculations as described in "Force field parametrization". Since the trajectory method is computationally very expensive, we used a conformational clustering to reduce the number of CCS calculations. First, we obtained representative conformers by the k-means method (threshold RMSD: 2.5 Å) implemented in the MMTSB Tool Set 37 . We obtained 13 clusters on average for each glycan. Then, the CCS value was calculated for the center structure of each cluster. We also performed clustering analysis with a small threshold (RMSD: 0.5 Å) to obtain a fine distribution of CCSs. This produces a maximum of 2,625 clusters and the CCS value was calculated for the center structure of each cluster. Finally, the CCS value of each center structure was weighted by multiplying the number of structures involved in the corresponding cluster, giving a fine distribution of CCSs as shown in Fig. 3. In the CCS calculation, we set the number of integration points to 25 in the Monte Carlo integration of the impact factor and orientation 54 . The CCS values in He drift gas were also calculated for comparison.