N-Glycosylation can selectively block or foster different receptor–ligand binding modes

While DNA encodes protein structure, glycans provide a complementary layer of information to protein function. As a prime example of the significance of glycans, the ability of the cell surface receptor CD44 to bind its ligand, hyaluronan, is modulated by N-glycosylation. However, the details of this modulation remain unclear. Based on atomistic simulations and NMR, we provide evidence that CD44 has multiple distinct binding sites for hyaluronan, and that N-glycosylation modulates their respective roles. We find that non-glycosylated CD44 favors the canonical sub-micromolar binding site, while glycosylated CD44 binds hyaluronan with an entirely different micromolar binding site. Our findings show (for the first time) how glycosylation can alter receptor affinity by shielding specific regions of the host protein, thereby promoting weaker binding modes. The mechanism revealed in this work emphasizes the importance of glycosylation in protein function and poses a challenge for protein structure determination where glycosylation is usually neglected.

presumably present in the inactive (full polysialo), inducible (full monosialo, partial monosialo 1 ), and active (full asialo, full GlcNAc) HA binding phenotypes, having varying sialic acid content and size 2,3 . Full pentasaccharide and full extended asialo glycoforms were generated to further evaluate the the effect of N-glycan size. With the full monosialo and full extended asialo glycoforms we were, for example, able to compare similar-sized glycans with and without sialic acids. The myeloma glycoforms mimic the predominant galactose-terminating and sialic acid-terminating CD44 glycovariants found recently in the mouse myeloma cells 4 . Non-glycosylated HABDs serve as a reference to the glycosylated proteins.

SB Avoiding simulation bias in the observed binding modes
When setting up any simulation system, one can bias the obtained results. To avoid favoring any particular binding mode between HA and HABD-CD44, the following two necessary steps were followed. First, we performed several replicas for each studied glycoform (Table 2). Second, HA was placed away from any potential binding site (1.5-2.5 nm). Here, we provide an example of such placing for the hyaluronate hexamers in simulations, see Fig. S2. This sort of placement reduces the risk of biasing HA's binding to any of the studied binding sites substantially. Simultaneously, the initial proximity enhances the chances of finding the binding modes in the allocated simulation time.  Fig. 1; and yellow spheres are the HA hexamers. Each hexamer was placed randomly 1.5-2.0 nm away from the protein surface. This initial configuration quaranteed that the hexamers were able to sample the protein surface freely during the course of the simulations.

SC Force field comparison
All the critical simulations were repeated with the CHARMM36 force field (systems C1-2 in Table 2), and their conclusions were consistent with the combination of AMBER99SB-ILDN and GLYCAM06 discussed along the text. In this manner we can assure that the results are not force field-dependent. Table S1 lists the coverage of each HA binding mode by the glycans using the CHARMM36 force field (simulations C1-2) in Table 2. These values are in general around 15 to 25 % smaller than that of the GLYCAM06 systems (see Table 1 in the main text). Yet, they also share the same trend: the crystallographic and parallel modes are the most and upright mode the least covered. In the CHARMM36 systems, the replica-to-replica variances are in general smaller than for GLYCAM06 as the N-glycans show greater mobility. SD N-glycan versus N-glycan contacts Figure S3 shows numbers of contacts between the five N-glycans on all tested glycoforms and force fields. Figure S4 presents a contact histogram of the most prominent binding residues found from the literature 5 . It shows the binding of these residues to the N-glycans of each N-glycosylation site. For reference, the histogram also shows how these residues bind to hyaluronate oligomers (HA 16 in our previous study 5 ). Figure S6 shows that three initially unbound HA 6 molecules bind to non-glycosylated CD44 HABD during 20 × 1000 ns trajectories. The occupied epitopes in HABD contacting the HA molecules are at residues 20-26, 38-42, 75-79, 87-100, 105-115, 141-161. These residues match the known epitopes for all three binding modes 5 . Qualitatively, we observe only one recognizable binding mode across 20 simulation replicas, with one HA fragment binding to the crystallographic binding mode. Yet, less distinct binding of the HA fragments occurred frequently at distinct binding sites. These observations along with the binding profile agree with the results of the NMR that show simultaneous binding of short HA fragments. Figure S5 illustrates the probability of different HABD residues to be in contact with the HA hexamers durting the simulations. In panel a, the contacts are centered around the R41-centered binding epitope as well as residues in the extended region of HABD. This extended region is also known to undergo major conformational shift from ordered to disordered state upon the binding of ligand. In this region, e.g., arginine R154 have been noted to stabilize the ligand-bound conformation by interacting with the bound ligand in the disordered state of the receptor. In panel b), the high probability of contacts in the crystallographic binding groove is due to the other HA hexamer being initially placed into this position. The fact that there was zero dissociation events of the initially-bound HA hexamer clearly shows that the employed simulation force field is capable of reproducing the correct binding interaction. Figure S6 also shows similar binding behavior for initially unbound HA 6 molecule that binds to HABD already occupied with another HA 6 in the crystallographic binding groove. The major binding epitopes locate roughly at residues 38-42, 105-115, 144-169. The C-terminal amino acids, 144-169, correspond to both the upright mode-specific HA binding residues and the residues that are influenced by MEM-85. This observation therefore agrees with the experimental observations and fortifies the notion of a second, lesser affinity binding site at the C-terminal portion of HABD. Table S2. The number of association or dissociation events between individual HA hexamers and HABD during 1000 ns simulations. Data are calculated from simulation set G5 in Table 2 (i.e., HABD with three initially unbound HA hexamers, see Note SB). Association is counted when there are above 50 individual contacts between the atoms of CD44 and HA. Dissociation is counted when number of contacts reaches zero. Weak interactions (i.e., number of contact values between 0 and 50) are omitted. Tables S2 and S3 show the number of attachments or detachments between each HA hexamers and CD44-HABD in simulation sets G5 and G6, respectively. One can see that each hexamer attaches to HABD at least once. However, many of the hexamers undergo multiple attachment-detachment cycles during the 1000 ns trajectories, with average number of attachments or detachments for the initially-free HA fragments being 3.1 ± 0.5. This indicates that the sampling of the binding surface is adequate. It also suggests that hexamers are most likely too short to readily form any of the possible binding modes at the studied time scales. Namely, hexamers are regarded as the minimum length of HA able to bind CD44 6 . Yet, the only hexamer that was placed into a predefined binding configuration in simulation set G6 remained in this configuration across all simulation repeats. This illustrates that the simulation force field is capable of reproducing the correct binding, thus giving further backing for the other results, too. Figure S7 shows and example of CD44-hyaluronate binding in both the crystallographic and upright binding modes. As the ligand is depicted at various timestamps, it is easy to see which carbohydrate units interact the most with the protein. Table S3. The number of association or dissociation events between individual HA hexamers and HABD during 1000 ns simulations. Data are calculated from simulation set G6 in Table 2. Association is counted when there are above 50 individual contacts between the atoms of CD44 and HA. Dissociation is counted when number of contacts reaches zero. Weak interactions (i.e., number of contact values between 0 and 50) are omitted. In these simulations, hexamers number 1 is attached to the crystallographic binding site, and no detachments are recorded. Hexamer number 2 is initially unbound similar to System G5. That is, it can readily sample the surface of HABD for possible binding sites.  Figure S8 shows relative interaction probability of key arginine residues with HA in glycosylated receptors as compared to non-glycosylated CD44. The data show the flanking arginines R150, R154, and R162 to have relatively more interactions with the HA ligand in the glycosylated cases. The arginines R41 and R78 at the primary binding site, on the other hand, are less active in binding when the receptor is glycosylated. This illustrates how the sugar shield obstructs the primary binding site.

SI Additional observations from the spontaneous binding simulations
Highlighting the sialic acid-induced repulsive effect, HA failed to bind HABD entirely in one replica of the full polysialo systems. This is the only occasion with GLYCAM06 force field when we did not observe interactions between CD44 glycoprotein and HA during a 1000 ns trajectory.
The charge-neutral full extended asialo glycoform with two galactoses per antennae shows HA binding comparable to the equally-sized but charged full monosialo glycoform, which is again significantly lower than that of the shorter monogalactoseterminating full asialo glyform. This highlights the role of the size of the N-glycans in masking the binding site and thus determining the HA binding properties. replicas for the GLYCAM06 and CHARMM36 systems, respectively. In the analysis, all the contacts were summed for each data point and a threshold for a contact was 0.6 nm.
Parallel Upright   N25  R29  K38  G40  R41  Y42  S43  I44  N57  K68  T76  C77  R78  Y79  G80  I88  N94  I96  C97  A98  A99  N100  N101  Y105  I106  L107  N110  D115  N120  N149  R150  D151  G152  R154  Y155  K158  R162  N164  E166   0  20  40  60  80  100  120  140  160  180  200  220  240  260  280  300 Residue Probability (%)   N25  N57  N100  N110  N120   N25  R29  K38  G40  R41  Y42  S43  I44  N57  K68  T76  C77  R78  Y79  G80  I88  N94  I96  C97  A98  A99  N100  N101  Y105  I106  L107  N110  D115  N120  N149  R150  D151  G152  R154  Y155  K158  R162  N164  E166   0  20  40  60  80  100  120  140  160  180  200  220  240  260  280  300 Residue Probability (%) Figure S4. Contact histogram for the observed HA binding residues, according to the literature 5 . It shows in how many aggregate simulation frames (%) given binding residue-N-glycan interaction is present. These data are calculated from the Myeloma monosialo systems. For comparison, it shows in how many aggregate simulation frames (%) given binding residue-HA interaction is present for each binding mode (Data available in Ref 5 ). . Hyaluronate-perturbed residues in simulations G5 (a) and G6 (b). The colored surface displays the probability of a given residue to be in contact with HA6 in our simulations. a: This simulation system entailed HABD and three randomly placed unbound HA 6 fragments. The observed binding was therefore spontaneous. b: This simulation system entailed HABD and one randomly placed unbound HA 6 fragment as well as one HA 6 fragment placed into the crystallographic binding mode. The observed binding of the unbound fragment was therefore spontaneous. Zero detachments of the bound fragment was detected during the simulations, see Table S3.  Figure S6. Contact histogram for all HA binding residues vs multiple HA 6 . It shows in how many aggregate simulation frames (%) given binding residue-HA interaction is present. These data are calculated from the G5 and G6 systems in two settings: first, with either three initially unbound HA 6 molecules, or second, with two HA 6 molecules from which one is initially bound in the crystallographic mode and one is initially unbound, respectively. The data are calculated and averaged over 20 (system G5) and 10 (system G6) simulation replicas, see Methods for details. Figure S7. Dynamics of the HA ligand in crystallographic (left) and upright (right) binding modes. The tan surface depicts the protein surface in the first frame of a simulation. The red sticks represent the ligand drawn at every 50 ns into the simulation trajectory. R41 is colored light green. Other key arginines are colored yellow and labeled accordingly. Data are extracted from our previous simulations 5 Figure S8. Relative interaction probability of HA with five CD44 arginines in different glycoforms. The data are calculated with GROMACS tool gmx mindist by counting the simulation frames in which a contact (HA-Arg minimum distance < 6 Å) is present (excluding the first 200 ns of binding). Values for each glycoforms are then normalized with the reference values from the non-glycosylated system in order to obtain the relative interaction probabilities. Values above 1 indicate an increase compared to the binding in the non-glycosylated case, while values below 1 indicate a decrease compared to the non-glycosylated case.