Markov models of the apo-MDM2 lid region reveal diffuse yet two-state binding dynamics and receptor poses for computational docking

MDM2 is a negative regulator of p53 activity and an important target for cancer therapeutics. The N-terminal lid region of MDM2 modulates interactions with p53 via competition for its binding cleft, exchanging slowly between docked and undocked conformations in the absence of p53. To better understand these dynamics, we constructed Markov State Models (MSMs) from large collections of unbiased simulation trajectories of apo-MDM2, and find strong evidence for diffuse, yet two-state folding and binding of the N-terminal region to the p53 receptor site. The MSM also identifies holo-like receptor conformations highly suitable for computational docking, despite initiating trajectories from closed-cleft receptor structures unsuitable for docking. Fixed-anchor docking studies using a test set of high-affinity small molecules and peptides show simulated receptor ensembles achieve docking successes comparable to cross-docking studies using crystal structures of receptors bound by alternative ligands. For p53, the best-scoring receptor structures have the N-terminal region lid region bound in a helical conformation mimicking the bound structure of p53, suggesting lid region association induces receptor conformations suitable for binding. These results suggest that MD + MSM approaches can sample binding-competent receptor conformations suitable for computational peptidomimetic design, and that inclusion of disordered regions may be essential to capturing the correct receptor dynamics.


Supporting Text Generalized Matrix Raleigh Quotient (GMRQ) Analysis
To choose optimal parameters for MSM construction, we use the GMRQ method recently developed by McGibbon et al. 1 This method exploits the variational principle of conformational dynamics, [2][3][4] , that any approximation to the true eigenvectors of a dynamical operator will necessarily underestimate its eigenvalues, which correspond to the relaxation timescales. For an MSM of n discrete conformational states indexed by k = 1...n, approximations to the eigenfunctions are linear combinations of indicator basis set functions φ k (x) (equal to 1 if x is in state k and 0 otherwise.) The generalized matrix Raleigh quotient R quantifies how well a given linear combination of basis functions captures the m slowest eigenvectors/timescales. It is computed as where A is an n × m matrix whose j th column contains the linear coefficients to approximate the j th eigenvector as ∑ k A k j φ k (x), C is an n × n time-lagged correlation matrix of the indicator basis functions calculated from the trajectory data, and S is an n × n covariance matrix of the indicator basis functions. For an MSM, both C and S can be estimated from the number of transitions between states observed in the simulation trajectories. Finding the coefficient matrix A that maximizes R is the same eigenproblem solved in the tICA method. 1 Once maximized, R can be used to evaluate the quality of the MSM state decomposition by how well it captures the true conformational dynamics.
To avoid overfitting to the data, a cross-validation approach is used in which the simulation data is partitioned into 5-fold leave-one-out training/testing sets. In five separate trials, R is maximized using 4/5 of the trajectory data, which we report as the GMRQ training score. Using these optimal values of the coefficients in A, the remaining 1/5 of the data is then used to compute R, which we report as the GMRQ testing score.
To select the MSM that most accurately captures the conformational dynamics of the MDM2 lid region, we explored various model construction parameters and chose the model with the largest GMRQ testing score ( Figure S2). The time step between collected trajectory snapshots is τ 0 = 100 ps. We explored MSMs built using different tICA lag times (i.e. the lag time to calculate the distance correlation matrix needed to find the tICA components) and different MSM lag times. For MSMs constructed using a range of 200 to 2000 microstates, a lag time of 1 (in units of τ 0 ) is optimal ( Figure S2a,b). We also explored MSMs constructed using different numbers of tICA components (2 to 15 tICs) as the subspace to project trajectory data and perform conformational clustering. We found that, for MSMs constructed using a range of 200 to 2000 microstates, projections utilizing 2 tICs give the best GMRQ testing scores ( Figure S2c). In all of our tests, GMRQ testing scores versus the number of MSM microstates plateau near 2000 microstates, so we chose this number of microstates for MSM construction.
Given that typical MSM lag times for protein folding and binding studies range from 1-200 ns, it is somewhat surprising that such a short lagtime (100 ps) is preferred for MSM construction. To validate this result, we additionally built MSMs using lag times of 2, 5, 10, 100 and 1000 (in units of τ 0 ). We find that the equilibrium populations and slowest relaxation mode eigenvector φ 1 are remarkably robust at all lag times, with the exception of 1000, for which finite sampling artifacts become pronounced ( Figure S3a). A likely explanation for the short lag time being optimal is the highly diffusive nature of the lid region dynamics, coupled with a trajectory data set enriched in many short simulations (see Figure S1). To test this idea, we built MSMs using lag times of 1, 10, 100 and 1000 using the same construction parameters, but without using a standard ergodic trimming step, which is usually employed to avoid statistical bias from non-equilibrium trajectories. 5 This bias is particularly pronounced for MSMs built from distributed computing simulations, due to the use of many short trajectories that make forward transitions to new states, but no backward transitions. Enforcing detailed balance on a MSM built from this data can therefore introduce "trap" artifacts in which states can have incorrectly high population estimates. Indeed, without ergodic trimming, the tICA model strongly exhibits the presence of traps, indicating the non-ergodicity of the underlying trajectory data arising from diffusivity and trajectory length ( Figure S3b).

Bayes Factor analysis of inter-residue contacts
To quantify the significance of inter-residue contacts formed in specific conformational states, we compute a Bayes Factor (BF) contact metric for each residue pair in MDM2. 6 The BF k (i, j) for contacts between residues i and j, given the protein is in 1/9 some conformational state k, is computed as Here, c i j is an indicator variable that takes the value of 1 if a contact between residues i and j is present, and 0 otherwise. The BF k (i, j) value can be thought as the statistical over-representation of contact c i j in conformational state k, and hence a measure of its importance in uniquely defining the structural features of that state. For example, if BF k = 2, that means that the equilibrium constant for contact formation between residues i and j is twice as large for state k as it is for the whole ensemble. We compute Bayes factors for contacts separated by three or more residues in sequence, and define a contact formed between two residues if any two non-hydrogen atoms are closer than 4Å.

Supporting Figures
• Figure S1. Trajectory length distributions • Figure S2. GMRQ Analysis • Figure S3. Effects of lag time and/or ergodic trimming on constructed MSMs • Figure S4. Free energy landscape of simulation data projected to tIC 1 and tIC 2 • Figure S5. Changes in inter-residue contacts and secondary structure along eigenmode relaxations φ 1 and φ 2 .
• Figure S6. Bayes Factors of inter-residue contacts for tICA landscape quadrants.
• Figure S7. DOCK scores projected onto the 2D tICA landscape for all ligands • Figure S8. Backbone RMSDs of MDM2 lid residues 11-17 to bound-state p53 helix.  In the unbound state, this sequence has low propensity to form a p53-like helix, with two exceptions: (1) There is some residual helicity in the unbound state, which is lost along the φ 2 eigenmode relaxation (positive to negative tIC 2 values, see Figure S4) as the N-terminus self-associates, and (2) association of the lid region with the p53 binding cleft induces structuring of residues 11-17 to a conformation very similar to bound p53 (green arrow).