Folding Free Energy Landscape of Ordered and Intrinsically Disordered Proteins

Folding funnel is the essential concept of the free energy landscape for ordered proteins. How does this concept apply to intrinsically disordered proteins (IDPs)? Here, we address this fundamental question through the explicit characterization of the free energy landscapes of the representative α-helical (HP-35) and β-sheet (WW domain) proteins and of an IDP (pKID) that folds upon binding to its partner (KIX). We demonstrate that HP-35 and WW domain indeed exhibit the steep folding funnel: the landscape slope for these proteins is ca. −50 kcal/mol, meaning that the free energy decreases by ~5 kcal/mol upon the formation of 10% native contacts. On the other hand, the landscape of pKID is funneled but considerably shallower (slope of −24 kcal/mol), which explains why pKID is disordered in free environments. Upon binding to KIX, the landscape of pKID now becomes significantly steep (slope of −54 kcal/mol), which enables otherwise disordered pKID to fold. We also show that it is the pKID–KIX intermolecular interactions originating from hydrophobic residues that mainly confer the steep folding funnel. The present work not only provides the quantitative characterization of the protein folding free energy landscape, but also establishes the usefulness of the folding funnel concept to IDPs.

The following common procedures were applied to conduct our simulations. The starting structure was solvated by water molecules and neutralizing counter ions in a cubic box with a buffer size of 15Å. We first carried out the energy minimization of the system with harmonic restraints of a force constant of 500 kcal/(molÅ 2 ) applied to heavy atoms (1000 steps of steepest descent and 4000 steps of conjugate gradient minimizations) followed by the one without such restraints (5000 steps of steepest descent and 5000 steps of conjugate gradient minimizations). We then increased the system temperature from 0 K to 300 K with a 20 ps constant-volume simulation, followed by a 200 ps constant-pressure simulation at P = 1 bar using the Berendsen's method. S11 These equilibration steps were repeated with different random initial velocities to perform independent production runs. The first 5 ns part of the production runs was done at constant pressure, and the rest at constant volume.
Short-range nonbonded interactions were cut off at 12Å, and electrostatic interactions were handled with the particle mesh Ewald method. S12 We used the SHAKE algorithm S13 to perform simulations with a 2 fs time step. Four independent 1 µs simulations were done for free pKID and free KID, and six independent 1 µs simulations for pKID-KIX complex.

Fraction of native contacts
We computed the fraction of native amino acid contacts (Q) following the procedure detailed in ref. S14 based on a list of native contact pairs. Native contact pairs refer to those heavy atom contact pairs -i atom in residue θ i and j atom in residue θ j separated by less than 4.5 A and satisfying | θ i − θ j | > 3 -found in the native structure, which is usually taken from an X-ray or NMR study. We used the NMR structures for HP-35 (PDB entry 1YRF S15 ) and WW domain (2F21 S16 ) to define native contact pairs in these systems, and the resulting Q versus simulation time is shown in Figs. S3 and S4. On the other hand, we observe large deviations (>5Å Cα root-mean-square fluctuations) between the simulated and NMR structures for pKID-KIX complex (Fig. S5). Such large conformational flexibility might be an intrinsic nature of the KIX domain that exhibits S-3 binding promiscuity and cooperativity. S17,S18 The large deviation might also be caused by the removal of the MLL peptide in our simulations, which is originally present in the NMR complex structure. Indeed, an inspection of Fig. S5 indicates that the removal of MLL significantly affects the KIX and pKID structures, which is in accord with the presence of allosteric communication between pKID and MLL through KIX. S4 It is therefore inappropriate to adopt the NMR complex structure in defining the native contact pairs in pKID-KIX complex, and we instead used the simulated complex structures for this purpose. Rather than choosing a single representative structure, native contact pairs were defined based on those contact pairs whose population is >70 % during the equilibrium complex simulations.
The resulting Q versus simulation time for free pKID and free KID is shown in Fig. S6 and that for pKID-KIX complex in Fig. S7.

Computation of the free energy
The free energy, f = E u + G solv , that defines the free energy landscape is given by the gas-phase potential energy E u and the solvation free energy G solv . E u can be calculated easily from the force field adopted in the simulations. For G solv , rigorous computational methods are available based on the free energy perturbation or thermodynamic integration, but they are not suitable here since we need to calculate G solv for quite a large number of simulated configurations. In the present work, we employed the the 3D-RISM (threedimensional reference interaction site model) theory for its computation. S19,S20 This is an integral-equation theory for obtaining the 3D distribution function g γ (r) of the water site γ at position r around the solute. In this theory, the distribution function is obtained by self-consistently solving the 3D-RISM equation S-4 and the approximate closure relation Here h γ (r) = g γ (r)−1 and c γ (r) are the total and direct correlation functions, respectively; χ γγ (r) denotes the site-site solvent susceptibility function which can be obtained either from simulations or integral-equation calculations; and u γ (r) is the solute-solvent interaction potential for a given solute configuration. We used the same numerical procedure as described in ref. S20 to solve the above equations.
Solvation free energy can then be computed from the following analytical expression: Here, ρ is the average solvent number density, and Θ is the Heaviside step function.
The resulting free energy (f = E u + G solv ) versus simulation time for the systems we investigated is presented in Figs. S3, S4, S6 and S7.

Construction of the free energy landscape
We constructed the folding free energy landscapes (f (Q)-versus-Q plots) for HP-35, WW domain, and free pKID shown in Fig. 2 of the main text based on the Q-and f -versus-time data presented in Figs. S3, S4, S6 and S7, and this was done using the method illustrated in Fig. S1 (see the next subsection for more details on the construction of the landscape for pKID). The slope of the landscape was then estimated through a linear fit.
The error estimations for the landscape curves and the slopes therefrom were done based on the block analysis. For example, ∼400 µs simulation trajectory for HP-35 was first divided into 4 blocks of ∼100 µs length, the f (Q)-versus-Q plot was then constructed for each block and its slope was computed, and finally their standard errors were estimated: for WW S-5 domain and pKID, the error estimations were done by regarding each of the independent simulation trajectories as a block. The standard errors for the free energy landscape curves are displayed in Fig. S8, and those for the slopes are reported in the main text.

Free energy landscape of disordered free pKID
The free energy landscape for free pKID is demonstrated as the cyan solid curve in the right panel of Fig. S6a. It is seen that only the Q 0.4 region is covered. This reflect the facts that (i) the free pKID simulations, initiated from the folded pKID structure taken from the NMR complex structure, are actually unfolding simulations since pKID when isolated is a The presence of the residual structure in free pKID is in agreement with the experimental observation. S21 To access the lower-Q region, we conducted the free KID simulations by "unphosphorylayting" the phosphorylated Ser-133 since experimentally it is known that free KID is more disordered than free pKID. S21 Indeed, we find that configurations up to Q ∼ 0 are sampled in the free KID simulations, and the resulting f (Q)-versus-Q plot covers the whole Q range (Fig. S6b). However, we cannot directly compare the free pKID and KID results because of the difference in free energy between the phosphorylated and unphosphorylated systems. To quantify the difference within the classical molecular mechanics, we proceeded as follows: (i) choose a free pKID configuration and compute its free energy f pKID , (ii) unphosphorylayte Ser-133 of that pKID configuration to obtain a KID configuration, and calculate its free energy f KID , (iii) compute the difference f pKID − f KID , and (iv) repeat this for 4000 free pKID configurations taken with a 1 ns interval from the four independent 1 µs simulation trajectories. As a result, we obtain the average difference ± standard deviation of −280.6 ± 6.1 kcal/mol. The f (Q)-versus-Q plots for free pKID and KID, in which the KID result is shifted to account for this difference, are compared in the right panel of Fig. S6a, and we find that their agreement is fairly well. The f (Q)-versus-Q plots for free pKID that S-6 covers the whole Q range, shown in Fig. 2c of the main text, is obtained using both the free pKID and shifted KID results.
We remark that the use of the "shifted KID results" does not affect the main points mentioned in the main text concerning the free energy landscape of pKID. Indeed, just based the free pKID result shown as the cyan solid curve in the right panel of Fig. S6a, we obtain the slope of −17.6 kcal/mol characterizing the funneledness of the landscape, which remains close to the value (−24.4 kcal/mol) computed along with the shifted KID results.
In particular, this does not alter the point that the landscape of free pKID is much less funneled than those of HP-35 and WW domain.

Site-directed thermodynamic analysis method
A computational method that provides an exact decomposition of the solvation free energy (G solv ) into contributions from constituent amino acid residues (G solv,i ), G solv = i G solv,i , has been developed in ref. S22. A corresponding partitioning can easily be derived for the gas-phase energy E u for classical force fields, E u = i E u,i . This is because E u can in general be written as is obvious for the non-bonded terms such as the Lennard-Jones and Coulomb interactions, and can be done also for the bond terms. For example, the dihedral-angle potential involves four atoms, but these atoms belong to at most two residues.
Therefore, the dihedral-angle potential can be classified into E intra i when all of those atoms belong to the same residue and into E inter ij otherwise.) By combining these results, one can decompose f = E u + G solv into contributions from constituent amino acid residues (f i ), f = f i . The residue-resolved contributions to ∆f int can be obtained similarly.

S-7
Computation of the terms in the standard binding free energy The statistical thermodynamic expression for the standard binding free energy is given by ∆G 0 bind = ∆ f − T (∆S config + ∆S ext ). S23 Here, ∆X for X = f or S config represents the change in X upon complex formation, X complex − (X free pKID + X free KIX ). We have already  Table S1), were chosen for this purpose.