A single residue substitution accounts for the significant difference in thermostability between two isoforms of human cytosolic creatine kinase

Creatine kinase (CK) helps maintain homeostasis of intracellular ATP level by catalyzing the reversible phosphotransfer between ATP and phosphocreatine. In humans, there are two cytosolic CK isoforms, the muscle-type (M) and the brain-type (B), which frequently function as homodimers (hMMCK and hBBCK). Interestingly, these isoenzymes exhibit significantly different thermostabilities, despite high similarity in amino acid sequences and tertiary structures. In order to investigate the mechanism of this phenomenon, in this work, we first used domain swapping and site-directed mutagenesis to search for the key residues responsible for the isoenzyme-specific thermostability. Strikingly, the difference in thermostability was found to principally arise from one single residue substitution at position 36 (Pro in hBBCK vs. Leu in hMMCK). We then engaged the molecular dynamics simulations to study the molecular mechanism. The calculations imply that the P36L substitution introduces additional local interactions around residue 36 and thus further stabilizes the dimer interface through a complex interaction network, which rationalizes the observation that hMMCK is more resistant to thermal inactivation than hBBCK. We finally confirmed this molecular explanation through thermal inactivation assays on Asp36 mutants that were proposed to devastate the local interactions and thus the dimer associations in both isoenzymes.

4 thermal inactivated samples were performed on a Superdex 200 10/300 GL column (Pharmacia, America), using an ÄKTA purifier as described in our previous paper 1 .

Model building
Crystal structures without substrates were used in this work for the hBBCK (PDB ID: 3DRE) and hMMCK (PDB ID: 1I0E) systems. The missing residues in chain B of hBBCK (residues 321-329) were directly generated from the complete chain A. The gap residues in both chains of hMMCK (residues 323-331) were modeled by Modeller 5-7 using the counterpart in chain A of hBBCK as template. Therefore, both structural are called modified crystal structures (or native structures) in this work. The missing residues in N-termini of hMMCK were not built because of their great flexibility that may destabilize the structures during simulations. Site-directed mutagenesis of residue 36 was implemented by VMD 1.9.1 8 based on the modified crystal structures. The four proteins were then placed in water boxes containing ~29150 water molecules and neutralized with 0.01 mol/L NaCl.

Simulation parameters
Amber12SB force field 9 and TIP3P water model 10 were engaged to quantify the atomic interactions. Both cMD and aMD 11,12 simulations were run using NAMD 2.9 13 with periodic boundary conditions (PBC) applied. The temperature was held at 298 K using the Langevin thermostat 14 , while the pressure was controlled at 1 atm by Berendsen pressure bath coupling method 15 . Particle mesh Ewald (PME) method 16 was used to calculate electrostatics. The van der Waals interactions were truncated at 9.0 Å with a long-range correction. The time step was set to 2 fs for both cMD and aMD simulations and the SETTLE algorithm 17 was used to enable the rigid bonds connected to all hydrogen atoms.
All four proteins followed a 3-step pre-equilibration. Firstly, all heavy atoms of proteins except the modelled loops were constrained with a force constant of 10 kcal/mol/Å 2 for 0.5 ns to allow the reasonable redistribution of water and ions.
Secondly, backbone atoms except the modelled loops were constrained with the restraint constant gradually relaxed from 10 to 0 kcal/mol/Å 2 in 2.4 ns. Finally, 30ns simulation without constraints was carried out. The last snapshot of pre-equilibration was chosen as the start structure for cMD and aMD productive simulations of each protein. No constraint was imposed on proteins during productive simulations, which lasted 100 and 200 ns for cMD and aMD simulations respectively (Table S3).
Structures were saved every 20 ps and diluted 5 times for further analysis.
Dual boost potentials 18 were applied independently to the dihedral and total energies in aMD simulations according to previous works 19, 20 where Ed and Et are the threshold energies for dihedral and total potentials, αd and αt are the corresponding acceleration factors, and nres and natom represent the numbers of protein residues and the total system atoms respectively. The average dihedral and total potential energies, <Vd> and <Vt>, were calculated from the last 10 ns of the cMD pre-equilibrations.

Binding free energy calculation
Dimeric binding free energies were calculated using the MM/PBSA method implemented in AMBER12 21 . 1000 frames from the diluted 100 ns cMD simulations were taken as input structures. Entropies were omitted considering the similar structures of all proteins. PB calculations for polar solvation free energies were performed using internal PBSA solver 22 . Dielectric constants for the interior and exterior of the molecule were set to 1 and 80 respectively. Nonpolar solvation free energies were evaluated by the SASA and a surface-integration method for the repulsive and attractive terms respectively. Atoms radii for the PB calculation were taken from the values optimized by Tan and Luo 23 . Solvent probe radius was 1.4 Å.
Binding free energies were further decomposed to each residue with 1-4 terms added to internal potential terms.

Movement mode analysis (FMA and PCA)
FMA based on partial least-squares algorithm 24,25 was implemented using GROMACS 5.1-dev 26 to find out the collective movements related with dimer dissociation. The biological function was defined as the total SASA of all interface residues in one subunit, which was the half of SASA difference between the two separated chains and the dimeric structure. Atom radii from the AMBER12 topology files were used to calculate SASA by VMD 1.9.1 8 . The starting structures for productive simulations were used as references and only α-carbon atoms were fitted and analyzed. The ensemble-weighted mode was used to generate the functional mode.
PCA was performed on α-carbon atoms only, using ProDy 27 . The top 20 modes were retained for further analysis.

Network analysis
Network analysis was performed to identify the interaction pathways between residue 36 and key interface ones. To construct the network, each residue was treated as a node and two nodes were connected only when the following two criteria were satisfied simultaneously: 1) the moving directions of the two residues in the PC2 mode have a scalar angle < 45°; 2) the two residues make physical contact, which requires the presence of at least one pair of heavy atoms from two candidate residues approximating to a distance of < 4.5 Å in at least 75% of the frames in the overall simulation trajectory 28,29 . The connections between adjacent residues were removed 28,30 . The generated undirected unweighted networks were analyzed using NetworkX 31 . The node degree describes the number of edges connected to one node.
The clustering coefficient of each node refers to the fraction of triangles that indeed exist in the network over all theoretically possible triangles in its neighborhood. The above two properties are averaged over all nodes to estimate the average degree and average clustering coefficient of the network respectively 32 . 10 repeated calculations were performed in a bootstrap manner to estimate the statistical significance: 20% frames were randomly removed from the productive trajectory for each time.
Considering the dependence of the constructed network upon the criterion parameters, other parameter values were also tested. The scalar angle between moving directions of two residues was set to 40°, 45°, 50° and 55°, the distance cutoff between two heavy atoms was tested at 4.0, 4.5, 5.0 and 5.5 Å, and the ratio cutoff (percentage in the trajectory) was evaluated for 65%, 70%, 75% and 80%. The results are shown in Tables S14-16. The difference between hBBCK and BP36L is robust and significant for most of the tested combinations of parameter values.

Clustering analysis
Clustering analysis was carried out using VMD 1.9.1 8 according to pairwise RMSD values. The optimal cutoff value was chosen to allow the top 8 clusters to contain >90% of the structural snapshots 33 , which is 3.0 Å in BP36L and 2.8 Å in the other proteins (Table S11). The structures closest to cluster centroids were taken as representatives of the clusters.
The radii of substrate entering pathway to the active site were calculated by HOLE2 34,35 and the graphs were prepared using VMD 1.9.1 8 . Default simple atom radii were used 36 .              Each residue is denoted by its chain name, residue name and residue ID, separated by whitespace. A total of seven common key interface residues (bold) are identified from the hBBCK and hMMCK systems.
Asn8, Asp62, Ala204 Asp54, Asp55 Asp54, Asp55 Asp55, Gln58, Asp62 Asp54, Asp55 Asp55 Asp54, Asp55 Asp54, Asp55 Asp55, Asp62 - Salt bridges and hydrogen bonds (H-bonds) were calculated using 100 ns cMD trajectories by VMD1.9.1 8 . The salt bridge was recorded when the distance between one of the carboxyl oxygen atoms of an acidic residue and the nitrogen atom of a basic residue falls within 3.2 Å for at least one frame. Meanwhile, the hydrogen bond is believed to form when the distance between the donor and receptor is within 3.5 Å and the angle of donor-hydrogen-acceptor is less than 30°. Only the hydrogen bonds present in >10% frames of the trajectories are listed. The correlation between functional modes is defined as the cosine value of the angle between two mode vectors, which ranges from -1 to 1.  The identified connections are robust in the bootstrap test, where 20% of frames were randomly removed from the trajectories and the process was repeated for 10 times.