Energy Landscape Topography Reveals the Underlying Link Between Binding Specificity and Activity of Enzymes

Enzyme activity (often quantified by kcat/Km) is the main function of enzyme when it is active against the specific substrate. Higher or lower activities are highly desired for the design of novel enzyme and drug resistance. However, it is difficult to measure the activities of all possible variants and find the “hot-spot” within the limit of experimental time. In this study, we explore the underlying energy landscape of enzyme-substrate interactions and introduce the intrinsic specificity ratio (ISR), which reflects the landscape topography. By studying two concrete systems, we uncover the statistical correlation between the intrinsic specificity and the enzyme activity kcat/Km. This physics-based concept and method show that the energy landscape topography is valuable for understanding the relationship between enzyme specificity and activity. In addition, it can reveal the underlying mechanism of enzyme-substrate actions and has potential applications on enzyme design.


Intrinsic specificity ratio
The process of protein-ligand binding can be physically quantified and visualized as a funnel-like energy landscape towards the native binding state with local roughness along the binding paths. [1][2][3][4][5][6][7][8][9][10] The "native" conformation of the complex is the conformation with the lowest binding energy E n . The energies of the all the non-native conformations follow a statistical Gaussian-like distribution 1,11 ( Figure   1). According to the energy landscape of binding 1,2,10,12-16 , we can describe the processes of an enzyme binds a substrate with the reaction coordinate of binding.
The total Hamiltonian energy function of a protein-ligand complex can be expressed as where ε i j representing the interaction strength between an atom i of the protein and atom j of the ligand is assumed to be random variable following a Gaussian distribution, which is related to the random energy model 17 . And σ i j equals 1 when the distance of the atoms i and j between protein and ligand is within the cutoff distance R c (about 3 ∼ 5Å for common protein-ligand complex); σ i j equals 0 otherwise (unbound). Based on the statistical energy landscape theory 14,18 , the native state n has the lowest energy E n , which is located at the bottom of the funnelled energy landscape. The native state is set as the enzyme-substrate complex (uncatalyzed state, native binding state, ES), as shown in Figure 1. Other than the native state, there are non-native binding states. In general, mutants of catalytic residues will lead to very low or none catalytic efficiency. Therefore, the mutants of other non-catalytic residual sites are selected in our studies. We only care about the free energy change between different systems. The energy of the catalytic residues almost does not change when compared among these mutants. The change of reaction barrier mostly comes from the change of environment, mainly around the mutated sites.
If we denote one σ i j = 1 as a native interaction, just as the previous studies 1,2,19 , for a conformation a with energy E a and an overlap Q with native state n, the fraction of native contact interactions of state a is where N is the total number of native contact interactions. Q is similar to the definition of reaction coordinate. By averaging over Gaussian distributions of interaction energy ε i j , the probability from equation Eq. S1 is given by Eq. S3 can be also written in terms of Q where m is the order (multi-body interactions) of interactions between protein and ligand,Ē is the mean energy, and ∆ε 2 is effective width or spread of the energy distribution per interaction.
In the microcanonical ensemble description of the thermodynamics, the energy and entropy of system are obtained as The conformational entropy S tot (Q) has relationship with s tot (Q) as s tot (Q) = S tot (Q) /N.
We define that δ E is the energy gap between the energy of native conformation E n and the average energy of all the conformational statesĒ, |E n −Ē|, reflecting the slope of the landscape; and ∆E is the energy fluctuation or the width of the energy distribution of the decoys, reflecting the roughness of the landscape 1 . δ ε n equals to δ E/N, which is the energy gap per contact interaction; ∆ε 2 equals to ∆E 2 /N, which represents the effective width of the energy distribution per contact interac-tion. Trapping temperature T g , at which the system is trapped with zero entropy S(T, Q, E n ) = 0, becomes ∆ε √ 1 2s tot (Q=0) when Q = 0. As illustrated in Figure 1, in binding process between enzyme and substrate, the minimum of Q ∼ 1 represents the native binding state, and Q ∼ 0 towards free energy minimum of unbound states. The free energy is given as: F = E −T S. Therefore, the transition temperature T t between the completely unbound states and native binding state is determined by the equality of the free energy minima at the completely unbound states and the free energy minima at native binding state ,which can be given by When taking the ratio of T t and T g (Q = 0), we obtain , which represents the property of energy landscape of enzymesubstrate system. Large Λ implies a funneled landscape against roughness and conformational states. Here we denote Λ as the intrinsic specificity ratio (ISR) where √ 2S is the scaling factor which accounts for the contribution of the entropy (S tot (Q)) to the specificity or the size of the landscape or the system. Large ISR implies discrimination of native state against non-native states, and therefore the high intrinsic specificity.
By docking studies from substrate to enzyme, the ISR value can be obtained by Eq. S9. δ E and ∆E can be calculated by where E D is the binding energy of each docking configuration (decoy). ⟨⟩ means the average over the ensemble of decoys. Therefore, ISR quantifies the discrimination of native binding mode against the nonnative binding modes modularized by the size of the system. ISR also reflects the shape of the topography of the underlying landscape by providing the relative measure of the slope of the landscape biasing towards native state against the roughness of the landscape modularized by the size of the landscape.
As a result, ISR is a direct measure of landscape topography including information on landscape slope, roughness and size.

Relationship between ISR and the activities of enzyme
In summary, intrinsic specificity ratio (ISR) correlates with conventional specificity when ligand binding to protein. In enzymes, the change of the interactions between enzyme and substrate caused by mutations will influence the topography of binding energy landscape, which can be easily quantified by ISR. As a result, the ISR can be used to reflect the trends of activity k cat /K m among a series of enzymesubstrate complexes. Moreover, enzyme activity may not necessarily correlate with conventional specificity (binding affinity difference) or kinetic specificity (catalytic rate) alone. But when conventional specificity or kinetic specificity is fixed, the change of activity should be consistent with the other characteristic. So as to ISR, according to the properties of binding energy landscape, we expect ISR to include both the characteristics of conventional specificity and kinetic specificity (as shown in Fig. S1). After accurate simulations with different methods, we will show that ISR has relationship with the important properties of enzyme activity, conventional specificity, and kinetic specificity. Therefore, ISR is very useful and informative.

Molecular dynamics simulation
Firstly, the isolated PPE and PHBH-FADHOOH were prepared for classical MD simulations. Two steps of energy minimizations were performed. All the water molecules and counterions were minimized with 5000 steps of steepest descent followed by 5000 steps of conjugate gradient. Then the whole system was minimized After docking, all the 12 complex systems of PPE and 11 complex systems of PHBH were prepared for 2 ns MD simulation. The pre-treatment and all the settings of MD simulation of each system were the same as that of iso-protein.

MMGBSA calculation
After product MD simulation, the binding free energy of each complex was calculated through the MMGBSA method. 22,23 In MMGBSA, the binding free energy, ∆G bind , is written as the sum of molecular mechanics energy, ∆E MM , salvation free energy contribution to binding, ∆G sol , and the conformational of entropic contribu-tion, −T ∆S. This can be represented as in Eq. S12: where both ∆E MM and ∆G sol can be divided into two parts: The ∆E ele and ∆E vdw in Eq. S13 are the electrostatic interaction and van der Waals energy in the gas phase, the ∆G gb and ∆G np in Eq. S14 are the polar and nonpolar contributions to the solvation free energy, respectively. The polar contribution of the solvation energy ∆G gb was calculated using the generalized Born model 24 . The nonpolar contribution of salvation energy ∆G np was calculated by Eq. S15: The solvent accessible surface area (SASA) in Eq. S14 was estimated using ICOSA model in AMBER. In this study, the values for γ and β were set as default values, 0.0072 kcal/(mol·Å 2 ) and 0 kcal/mol.
For each system, 950 snapshots were extracted from the 2 ns MD trajectory of the complex at an interval of 1 ps (first 50 ps is not included for energy calculation).
Because there was just one mutation site of each mutated system, the difference of entropy between the mutated system and wild-type system was rather small. As we only care about the relative value of energy but not the absolute value, here we did not calculate the entropy of each system. For wild-type PPE and PAPA complex, per-residue energy decomposition was also performed on the wild-type complex to determine the mutation sites. However, because of the small size of substrate pHB, the binding site of PHBH is rather smaller than PPE. For sampling the mutant systems as much as possible, we select the residues in the vicinity of substrate pHB for mutation sites. After the MD simulation of mutated systems, alanine scanning method was used to obtain the binding energy of each mutated system.

QM/MM simulation
The proposed reaction mechanisms of the acylation step of amidase activity of PPE and the OH-transfer step of PHBH are illustrated in Fig. S2 and Fig. S3. A QM/MM MD simulation was implemented on the equilibrated complex structure of each system to calculate the mechanism of the acylation reaction of amidase activity. And the umbrella sampling (US) 25

MD simulation
After docking substrate PAPA to wild-type PPE, pHB to wild-type PHBH, we gained the stable conformation of PPE-PAPA and PHBH-pHB complexes. When the complexes reached dynamical equilibration, the structures were shown in Fig. S4B and Fig. S5B, respectively. In PPE systems, there is a long and narrow substrate binding pocket around the active site to bind polypeptides with different lengths and types of amino acids. 30 For substrate PAPA, which can be catalyzed dominantly by PPE through amidase activity, its C-terminal part is located close to the catalytic triad   of PPE, His45-Asp93-Ser188. Aiming to find out which residues of PPE contribute to the binding of substrate, binding energy decomposition was performed to determine the mutation sites. As illustrated in Fig. S6, residues His46, Val88, Trp164, Gln185, Gly186, Asp187, Ser207, Phe208, Val209, Ser210, and Arg211 are crucial to PAPA binding, especially the residues His46, Phe208, Val209, and Arg211. PAPA has hydrogen bonds with Gln185, Gly186, Ser188, Ser207, Val209, and Arg211, which are helpful to form stable enzyme-substrate complex.
To design the mutation sites for specificity study, the residues for mutation should be located in the active site (have contribution to PAPA binding), and do not affect the reaction mechanism of PAPA deamination (all the enzymes should catalyze substrate PAPA with the same reaction mechanism). Therefore, the mutation sites: 1)  Fig. S7A), helping to stabilize the transition state. These two residues compose an oxyanion hole, which is suggested to stabilize the two proposed tetra- For PHBH system, the OH-transfer catalysis is one-step reaction. The reaction mechanism is illustrated in Fig. S8. During the reaction, the hydroxyl group is transferred from co-factor FADHOOH to substrate pHB, with the O86-O87 bond broken and O87-C6 bond formed. As depicted in Fig. S8B, there are several important hydrogen bonds between pHB and the hydroxyl group of Tyr201, Ser212, and Tyr222, which contribute to the stability of the transition state. Moreover, the three mutations, Y201A, S212A, and Y222A, have much lower binding affinity with pHB than wild-type enzyme. Therefore, these mutations have a large impact on their activity.
Overall, we constructed the whole free energy curve for the amidase reaction of PPE-PAPA complex (Fig. S7B) and PHBH-pHB complex (Fig. S8A) by using WHAM. In PPE system, there is a small barrier (8.86 kcal/mol) in the first step.
Then the energy of system drops to 6.71 kcal/mol at the end of the first step. The