Computationally Discovered Potentiating Role of Glycans on NMDA Receptors

N-methyl-D-aspartate receptors (NMDARs) are glycoproteins in the brain central to learning and memory. The effects of glycosylation on the structure and dynamics of NMDARs are largely unknown. In this work, we use extensive molecular dynamics simulations of GluN1 and GluN2B ligand binding domains (LBDs) of NMDARs to investigate these effects. Our simulations predict that intra-domain interactions involving the glycan attached to residue GluN1-N440 stabilize closed-clamshell conformations of the GluN1 LBD. The glycan on GluN2B-N688 shows a similar, though weaker, effect. Based on these results, and assuming the transferability of the results of LBD simulations to the full receptor, we predict that glycans at GluN1-N440 might play a potentiator role in NMDARs. To validate this prediction, we perform electrophysiological analysis of full-length NMDARs with a glycosylation-preventing GluN1-N440Q mutation, and demonstrate an increase in the glycine EC50 value. Overall, our results suggest an intramolecular potentiating role of glycans on NMDA receptors.

. (a) Cross-validation with the use of generalized matrix Rayleigh quotient (GMRQ) for the slowest implicit timescale as the score function was used to choose the optimal number of Markov states to model glycosylated GluN1 LBD, namely 99. Average scores for training sets (red) and test sets (blue) over 25 random divisions of the data set into training and test sets are provided. (b) In the case of our glycosylated GluN2B LBD simulations, the optimal number of Markov states appears to be 118, though the model with 99 states is relatively good, too. For comparison, we used Markov models with 99 states both for GluN1 and GluN2B LBDs.

Section S2. Choice of the optimal lag time
Lag time is the distance in time between neighboring frames from MD trajectories used to estimate the MSM transition matrix. Too small values of the lag time can lead to inadequacy of MSM due to a violation of the Markov assumption at short timescales. Too large values of the lag time lead to poorer statistics on transitions. Analysis of the convergence of implicit timescales of MSMs as a function of lag time (Fig. S2) can be used to choose the optimal value of lag time. For our simulations of glycosylated GluN1 LBD, we chose the lag time of 256 ns. In the case of GluN2B LBD, the overall picture is the same as in GluN1 LBD. For the purpose of comparison, this lag time was used to model all four molecular systems under investigation, glycosylated and non-glycosylated GluN1 and GluN2B LBDs.
(see next page) S2. (a) Five slowest implicit timescales, estimated from MSM for glycosylated GluN1 LBD with 99 states, converge to stationary values as the lag time increases, confirming the applicability of the Markov assumption. We chose the lag time of 256 ns as a compromise between ensuring Markov and good statistics on transitions between Markov states. Note the gap between the first and the second slowest timescales by a factor of 2.5-4 (depending on the lag time), implying a relatively good decoupling of clamshell opening/closing motion from other conformational transitions in glycosylated GluN1 LBD. (b) In GluN2B LBD, the overall picture with implicit timescales is the same. However, the timescales are faster (up to ~0.2 μs, vs. up tõ 0.5 μs in GluN1), and the gap between the first and the second slowest timescales is smaller (a factor of ~1.4-1.6).

Section S3. Estimate of the statistical significance of the difference between probability distribution functions for glycosylated and non-glycosylated GluN1 and GluN2B LBDs by bootstrapping
We performed 1000 rounds of resampling of time series d(t) for glycosylated, and, separately, non-glycosylated GluN1 LBD. For each resampled data set, the probability distribution functions for glycosylated and non-glycosylated GluN1 LBD, denoted below as f1(d) and f2(d), respectively, were computed with the use of Markov state models with 99 states and the lag time of 256 ns. Then, the difference between the two probability distributions, was computed for each of 1000 sets of resampled data. Confidence intervals for df(d) were  .85 nm are statistically significantly more populated (the confidence interval for df, which is the difference between probability distribution functions for glycosylated and non-glycosylated GluN1 LBD, is above zero at these values of d), and those with d in the range of 4.57 to 4.78 nm are statistically significantly less populated (the confidence interval for df is below zero) in glycosylated GluN1 LBD than in non-glycosylated GluN1 LBD (percentile bootstrap, confidence level of 95%).
Black: the best estimate of df based on the available MD trajectories (in other words, the red curve from Fig. 2c minus the blue curve from Fig. 2c). Red: the confidence interval for df with the confidence level of 95% estimated by percentile bootstrap. Blue: the same, with 99% confidence level. (b) In GluN2B LBD, the glycosylated form is statistically significantly more stable in the ranges of 3.55 to 3.95 nm, and less stable between 4.05 to 4.69 nm (percentile bootstrap, confidence level of 95%).

Section S4. Physical interpretation of slowest timescales of Markov state models
To clarify the meaning of the slowest Markov state model timescale, consider a simplified physico-chemical two-state model of GluN1 or GluN2B LBD, with one open and one closed states (O and C, respectively), (1) and assume that the transitions between these two states are described by the first order reaction kinetics, with the rate constants kcl and kop for closing and opening, respectively. Then the concentration of the open and closed forms will depend on time t as follows: where constants c0, c1, c2 and c3 are determined by the initial concentrations of the open and closed forms and the ratio of the rate constants kcl / kop. On the other hand, the time evolution of the ensemble of Markov processes for an n-state Markov chain can be written in comparable notations as where i  are characteristic timescales of the Markov chain sorted in the descending order, Glycosylation was performed with the use of Glycoprotein Builder. 10 Energy minimization, heating and two-stage preequilibration resulted in preequilibrated structures used for production simulations. These preequilibrated structures were within 1 Å from the original X-ray structures in terms of RMSD for the protein backbone atoms.
For the GluN2B simulations, a homology model of the LBD was built from full NMDAR X-ray structures (PDB codes: 4PE5, 8 4TLL, 4TLM 9 ). The six LBD domains (two from each structure) were aligned and used to build a consensus model using Schrodinger's Maestro software (version 2015-1). 11 The template sequence consisted of residues 401 to 604 and 658 to 806, with a glycine linker between 604 and 658. The sequence was based off of the GluN2B subunit from H. sapiens, uniprot identifier Q13224-1. The structures were solvated in TIP3P water with sodium and chlorine ions added to neutralize the system. For the glycosylated system, glycosylation was performed with the use of the Glycoprotein Builder. 10 MD simulations were performed with Amber ff99SB-ILDN 12 and GLYCAM_06i 13 force fields (for the protein and glycan parts, respectively), resulting in 262 MD trajectories for glycosylated and 196 MD trajectories for non-glycosylated GluN1 LBD, with the aggregate duration of 107 and 106 μs, respectively. For the GluN2B LBD, 247 trajectories for the glycosylated and 613 trajectories for the glycosylated form were performed, resulting in an aggregate simulation time of 86 and 344 μs, respectively. Simulations were run on Folding@home and various types of GPUs available on Stanford computer clusters (Sherlock, XStream). Depending on the type of the used GPU, some of the trajectories were generated in OpenMM with hydrogen reweighting, constrained length of all covalent bonds and the timestep of 5 fs, 14 and other trajectories were run in Amber with hydrogen reweighting, constrained length of covalent bonds with hydrogen atoms and the timestep of 4 fs. 15 Coordinates of all atoms were recorded every 0.2 ns. The following checks of the resulting trajectories were performed: stability of the volume, potential and total energy of the system; the distance between mirror images of the (glyco)protein created by periodic boundary conditions (in more than 80% of frames exceeded 19 Å, 16 Å, 25 Å and 14 Å for the glycosylated GluN1 LBD, non-glycosylated GluN1 LBD, glycosylated GluN2B LBD and non-glycosylated GluN2B LBD, respectively; in more than 99% of frames, exceeded 15, 11, 16 and 9.5 Å, respectively); RMSD of the protein backbones in neighboring frames in each MD trajectory (always stayed below 4.5 Å for the timestep of 0.2 ns, and typically equaled ~1.5 Å). Visual molecular dynamics (VMD) package 16 was used to visualize molecular structures.
Interpretation of molecular dynamics trajectories. Markov state models 17 were built to reconstruct the thermodynamic and kinetic properties of the NPT ensembles of glycosylated and non-glycosylated GluN1 LBD and GluN2B LBD proteins at equilibrium from finite-length MD trajectories, each of which did not completely sample the configuration space. For featurization, the distance d between Cα atoms in residues 507 and 701 in GluN1 LBD was used to capture the opening/closing motion of the module. This choice of the residues followed that used in the experimental papers on smFRET investigation of GluN1 LBD opening/closing dynamics on the millisecond-to-second timescale. 18,19 In GluN2B LBD, the distance between Cα atoms in residues 503 and 701, which are homologous to residues 507 and 701 in GluN1 subunit, was used. The MSMbuilder 3.3 package 20 was used to construct microstate models with varying number of Markov states. Maximum Likelihood Estimator was used to generate a transition probability matrix Tij, which maps out the probability of transitioning from state i at time t to state j at time t   , where is the lag time of the model. The optimal number of Markov states was chosen by cross-validation with the use of generalized matrix Rayleigh quotient (GMRQ) for the slowest implicit timescale as the score function (SI, section S1). 1 The Markov lag time was chosen to be 256 ns based on the analysis of the plots for the implied timescales versus lag times used to compute these implied timescales (SI, section S2). MSM-weighted probability distributions were obtained by binning the raw data within each MSM state and weighting it by the MSM equilibrium state population. To estimate the stability of our key conclusions relative to the model framework used to process the MD data, we also present the results for MSMs with a different number of clusters or a different lag time, as well as the timescale estimated from timestructure independent component analysis (tICA), 21 all with the same featurization d (Table 1).