Conformational heterogeneity of the calmodulin binding interface

Calmodulin (CaM) is a ubiquitous Ca2+ sensor and a crucial signalling hub in many pathways aberrantly activated in disease. However, the mechanistic basis of its ability to bind diverse signalling molecules including G-protein-coupled receptors, ion channels and kinases remains poorly understood. Here we harness the high resolution of molecular dynamics simulations and the analytical power of Markov state models to dissect the molecular underpinnings of CaM binding diversity. Our computational model indicates that in the absence of Ca2+, sub-states in the folded ensemble of CaM's C-terminal domain present chemically and sterically distinct topologies that may facilitate conformational selection. Furthermore, we find that local unfolding is off-pathway for the exchange process relevant for peptide binding, in contrast to prior hypotheses that unfolding might account for binding diversity. Finally, our model predicts a novel binding interface that is well-populated in the Ca2+-bound regime and, thus, a candidate for pharmacological intervention.

Supplementary Figure 9: The tICs provide a superior separation of the dynamic landscape of C-CaM compared to RMSD. Energy landscapes of apo C-CaM as a function of projection onto the first tIC and RMSD to the (a) apo and (b) holo structures show superior separation along the tIC order parameter as compared to RMSD. These same energy landscapes for holo C-CaM show that RMSD to the (c) apo and (d) holo structures provides no separation for the dynamic processes in holo C-CaM. Conformations were weighted by their MSM probabilities, and free energy values are reported in kcal mol −1 . Apo and holo reference structures are 1CFD and 1CLL, respectively. Figure 10: Energy landscape of C-CaM according to RMSD and fraction of native contacts. The energy landscape of apo C-CaM was generated by weighting each conformation by its MSM probability and binning by RMSD to the apo structure and Q dif f , which denotes the difference between the fractions of native contacts in the apo and holo states. Reference structures for the apo and holo structures were 1CFD and 1CLL, respectively, and free energy values are reported in kcal mol −1 .

Supplementary
Supplementary Figure 11: Mutual information between residues in apo C-CaM. The most dynamically coupled regions of the protein involve the residues in the αF and the second Ca 2+ -binding site. The key hydrophobic residues involving in binding of substrate are present on αF. Interestingly, the first Ca 2+ -binding site is not strongly coupled with any other region of the protein. Colors indicate the log of the mutual information value.

Supplementary Methods
Simulation details using the CHARMM force field Two conformations from each state of the apo MSM were taken as seeds for additional simulations run using the CHARMM36 force field 1 . Structures were solvated with TIP3P 2 water molecules such that water extended at least 10Å away from the surface of the protein, and 24 Na + ions and 12 Cl − ions were added to the system to neutralize the charge. Energy minimization for 1000 cycles and heating to 300 K were performed in Amber 14; production MD trajectories were run for a total of 32 µs.
Mutual Information The excess mutual information was computed for all protein torsion angles (backbone dihedrals φ, ψ and side chain χ angles (only the first χ angle for proline)) using the entire simulation data for apo C-CaM to capture the correlated motions of residues in an unbiased, statistically robust manner. The following formula was used for the calculation of mutual information between residue pairs 3 : The average of the mutual information computed from 3 iterations of scrambled data was subtracted from the mutual information values computed from the simulation data to filter out correlations that are not statistically significant.
NMR order parameters The NMR order parameter, S 2 , describes the bond vector autocorrelation function: where P 2 is the second-order Legendre polynomial, µ(t 0 ) is the unit vector along a specific bond at time t 0 , and ... indicates the ensemble average 4 . For comparison to relaxation-based order parameters, equation 2 is evaluated at the experimentally-determined molecular tumbling time (5.0 and 8.2 ns for apo and holo CaM, respectively 5, 6 ). S 2 values were evaluated from all simulation data retained in each MSM; because autocorrelation reduces the effective number of independent samples, the effective number of data points was estimated from: where n is the number of data points in the autocorrelation calculation and τ int is the estimated integrated autocorrelation time of the time series 7 . Error bars represent the 95% confidence intervals. Order parameters calculated for the apo and holo systems were compared to experimental data for backbone amide collected in the absence of Ca 2+ and side chain methyl groups 5 collected for Ca 2+ -saturated CaM 6 , respectively.
Native contacts The following expression was used for the calculation of the fraction of native contacts, Q(X), for a conformation X: where r ij (X) is the distance between residues i and j in conformation X, r 0 ij is the distance between residues i and j in the reference conformation (the apo or holo state of C-CaM). The set S represents all pairs of heavy atoms (i, j) belonging to residues θ i and θ j such that | θ i − θ j |> 3 and r 0 ij < 4.5Å. The values of parameters β and λ are taken from the literature to be 5Å −1 and 1.8 respectively 8 .