Cloud computing approaches for prediction of ligand binding poses and pathways

We describe an innovative protocol for ab initio prediction of ligand crystallographic binding poses and highly effective analysis of large datasets generated for protein-ligand dynamics. We include a procedure for setup and performance of distributed molecular dynamics simulations on cloud computing architectures, a model for efficient analysis of simulation data, and a metric for evaluation of model convergence. We give accurate binding pose predictions for five ligands ranging in affinity from 7 nM to > 200 μM for the immunophilin protein FKBP12, for expedited results in cases where experimental structures are difficult to produce. Our approach goes beyond single, low energy ligand poses to give quantitative kinetic information that can inform protein engineering and ligand design.

are listed, but as no experimental association rates are available for these particular ligands, we compare to related FK506-derived ligands (Tassa, et al.) with a similar binding affinity range and association rates from surface plasmon resonance (SPR). The strongest binders from both ligand series have rates that compare well at ≈ 20 · 10 6 M −1 s −1 .  Table 4. Top flux pathway descriptions for FKBP12 ligands. For each pathway, a flux for each ligand is shown as a percentage out of the total flux represented by the analyzed top flux pathways. A pathway was analyzed if it met the criterion of having > 50% flux of the maximum flux pathway. The encounter complex is defined as the first observed on-pathway metastable state with formed ligand-protein interactions and is described with a summary of the interactions formed: near bound, within 5Å of the predicted bound state; flipped intermediate, within the active site but flipped, as the example described in Supplementary Fig. 5; 80's loop interaction, ligand binding either on the C-terminal or N-terminal ends of the loop formed by residues 84-91; nonspecific site, ligand interactions with sites A, B, or C as labeled in Supplementary Fig.  4  Supplementary Figure 1. Convergence of lowest free energy ligand MSM state. The lowest free energy state was selected from MSMs built at ≈ 10 µs intervals as aggregate sampling time is increased. We plot the state RMSD to the reference poses derived from crystallography (described in the main text), or in the case of Androstan the final predicted pose, as the rolling mean with standard deviations over 2 data points (≈ 20 µs). Data shown for ligands L2 (a), L3 (b), L6 (c), L9 (d), Androstan (e). Adaptive sampling times as described in Supplementary  Table 1 are marked with red dashed lines. Nonspecific sites A, B, and C are labeled for L2 as non-converged binding sites and described in Supplementary Fig. 4.

Supplementary Figure 2. Predicted FKBP ligand binding poses from MSM free energies. (a)
Overlap of the MSM-predicted binding poses of FK506-derived ligands (shown in stick representation) with the electron density derived for FK506 (PDBID 1YAT). For clarity, the densities shown are restricted to FK506 atoms which correspond to the derived ligand. (b) Free energy isosurfaces at 1.0 kcal/mol of the 3-D MSM-weighted free energy map are shown for all ligands.
Supplementary Figure 3. Low free energy surfaces can predict druggable regions of the binding site. (a) The MSM free energy isosurface at 1.5 kcal/mol is shown for L2 (in cyan stick representation), which extends down the N terminal region of the 80s loop (green square) (b) L6 (shown in cyan stick representation), as well as L9, and FK506 (not shown) make contacts in this same region, which contribute to an approximately two orders of magnitude increase in binding affinity.
Supplementary Figure 4. Predicted nonspecific binding sites. The MSM-weighted free energy map at the isosurface of 3.0 kcal/mol reveals high free energy binding sites on the protein that are common between all ligands. The primarily hydrophobic interactions formed with the ligand in the three sites, labeled A, B, and C, are also depicted Supplementary Figure 5. Hydrogen bond shuttling mechanism in common FK506 ligand intermediate. The equilibrium population for these states are L2: 0.8%, L3: 0.8%, LG6: 0.5%, LG9: 0.9%. (a) The structure of the intermediate is shown for L6 and L2, with a comparison to the predicted crystal binding mode for L2. In the intermediate, the ligand isoleucine-mimetic or ring sidechain (represented by distances to the central atom c3) forms hydrophobic interactions with residues F99 and W59, but switches to interactions with F36 and I90. This switching is shown in the distance distributions in (b). Shown in (c) and (d), the ligand carbonyl groups, labeled o1, o2, and o3, initiate hydrogen bonds in the intermediate with both the Y82 hydroxyl and I56 backbone (red distributions). Conversion to the bound state stabilizes the o2 interaction with Y82 and allows o3 to find its hydrogen bond with I56.
Supplementary Figure 6. Implied timescales for the final MSM for all ligands. Converging timescales, or the log of the eigenvalues µ computed from the transition matrix P ij for models constructed at the final aggregate sampling time. Various lag times τ are shown with separate colors for each timescale for L2 (a), L3 (b), L6 (c), L9 (d), Androstan (e). The lag time of 10 ns was chosen for final analysis, shown for all plots as a dashed black line in the initial plateau region. The appropriate lag time for an MSM in practice requires a balance of demonstrated Markovian behavior (see Methods in the main text) with maintenance of good statistics that contribute to the matrix Cij. Here the smallest lag time was chosen in regions where convergence behavior is first demonstrated.