A combination of virtual screening, molecular dynamics simulation, MM/PBSA, ADMET, and DFT calculations to identify a potential DPP4 inhibitor

DPP4 inhibitors can control glucose homeostasis by increasing the level of GLP-1 incretins hormone due to dipeptidase mimicking. Despite the potent effects of DPP4 inhibitors, these compounds cause unwanted toxicity attributable to their effect on other enzymes. As a result, it seems essential to find novel and DPP4 selective compounds. In this study, we introduce a potent and selective DPP4 inhibitor via structure-based virtual screening, molecular docking, molecular dynamics simulation, MM/PBSA calculations, DFT analysis, and ADMET profile. The screened compounds based on similarity with FDA-approved DPP4 inhibitors were docked towards the DPP4 enzyme. The compound with the highest docking score, ZINC000003015356, was selected. For further considerations, molecular docking studies were performed on selected ligands and FDA-approved drugs for DPP8 and DPP9 enzymes. Molecular dynamics simulation was run during 200 ns and the analysis of RMSD, RMSF, Rg, PCA, and hydrogen bonding were performed. The MD outputs showed stability of the ligand–protein complex compared to available drugs in the market. The total free binding energy obtained for the proposed DPP4 inhibitor was more negative than its co-crystal ligand (N7F). ZINC000003015356 confirmed the role of the five Lipinski rule and also, have low toxicity parameter according to properties. Finally, DFT calculations indicated that this compound is sufficiently soft.


Virtual screening and molecular docking
FDA-approved DPP4 inhibitor drugs such as Sitagliptin, Vildagliptin, Saxagliptin, and Linagliptin were selected as templates for the beginning virtual screening (Fig. 1).Using the ZINC database (www.zinc.docki ng.org) 21 , the compounds were screened on ≥ 50% similarity 22 .A set of 200 compounds based on structural similarity was retrieved from the ZINC database including 1500000 compounds 23 .Molecules were converted to the pdbqt format to prepare for molecular docking study.The three-dimensional crystal structures of DPP4 (PDBID: 4a5s), N7F (co-crystal ligand of 4a5s), DPP8 (PDBID: 7a3k), QX8 (co-crystal ligand of 7a3k), and DPP9 (6eor), 9xh (co-crystal ligand of 6eor), were downloaded from the RCSB Protein Data Bank (https:// www.rcsb.org/) 24 .The molecular docking study was run using Autodock Vina.Hydrogen atoms were added to the protein, and charges were assigned.The grid box size of 30 × 30 × 30 Å and exhaustiveness of 100 were set for screened ligands.The docking validation was accomplished by performing self-docking and the result was reported with root mean square deviation (RMSD) value 25 .The Discovery Studio Client 2016 was used to visualize the interactions and binding poses of the compounds.

Molecular dynamics simulation
The simulations were conducted using GROMACSv2021 (http:// www.mdtut orials.com/ gmx/ compl ex/ index.html) applying an AMBER force field 26 .sufficient water molecules were replaced by counter ions.Periodic boundary conditions (PBC) were applied to all three directions of the system.Initially, the steepest descent algorithm was used for energy minimization 27 .Then, the system was equilibrated by NVT and NPT ensembles respectively.The equilibration at constant volume (NVT) was done with a coupling constant of 0.1 ps for 100 ps at a steady temperature of 300 K recruiting the Berendsen thermostat algorithm.The equilibration at constant pressure (NPT) was subsequently carried out with a coupling constant of 5.0 ps for 1 ns in the pressure of 1 bar maintained by the Parrinello-Rahman barostat.Particle Mesh Ewald (PME) was utilized to calculate the range of the interactions.Long-range electrostatic interactions and short-range non-bonded interactions were calculated within a 12 Å cut-off.Two simulations were performed with durations of 200 ns.The analysis of Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), the Radius of gyration (Rg), the number of hydrogen bonds, and principle component analysis (PCA) were analyzed through the MD trajectories 28 .

MM-PBSA calculation
The combination of molecular dynamics simulations and thermodynamic techniques such as Mechanic/Poisson-Boltzmann Surface Area (MM-PBSA) led to the measurement of the total binding free energy between ligand and protein as follows 29 : where, G (complex) is the total free energy of protein-ligand complex and G (receptor) and G (ligand) are the free energies of the isolated protein and ligand in a solvent, respectively.The total free energy for each of the three mentioned entities (complex receptor or ligand) could be calculated from its molecular mechanics' potential energy (∆G MM ) plus the energy of solvation (∆G sol ) 30 .The intermolecular van der Waals (ΔE vdW ), electrostatic interactions (ΔE elec ), and nonpolar solvation energy (ΔE np ) are favorable for binding.Still, polar solvation-free energy (ΔE pol ) and the configurational entropy (− TΔS) are unfavorable in ligand-protein binding.Thus, the MM/PBSA GROMACS was used to calculate the total free binding energy of the ligand-protein complex through the MD trajectories 31 .

ADMET profile
In-silico investigation including physicochemical properties and pharmacokinetic properties such as absorption, distribution, metabolism, excretion, and toxicity for selected DPP4 ligands, its related co-crystal, and some FDA-approved drug ligands, was accomplished.The results were obtained from SwissADME and the preADMET online servers (http:// pread met.bmdrc.org/).

DFT studies
Gaussian 09 was used to analyze the Density Functional Theory (DFT) of selected DPP4 inhibitor candidates used and visualized through Gauss view 6.0.The optimization of the structural coordinates of the selected compounds was performed by using the B3LYP/6-31 G (d,p) level basis set without any symmetrical constraints.The electrostatic surface potential, the HOMO-LUMO energy, and thermochemical parameters (thermal Energies, thermal Enthalpies, thermal Free Energies, hardness, softness, Ionization energy, and electron affinity) of the ZINC000003015356 were obtained from the optimized geometry.

Virtual screening and molecular docking studies
Molecular docking was conducted to characterize the molecular interactions and binding affinity of the protein-ligand complex 16 .The top-ranked ligands were grouped and filtered based on the affinity and appropriate orientation within the active site of the protein.Thereby screening the large library of ZINC database containing millions of known compounds, 200 compounds with a similarity of ≥ 50% were screened and molecular docking was performed to investigate their affinity to the DPP4 enzyme.The accuracy of molecular docking was confirmed by the re-docking simulation of the co-crystal ligand (N7F) in the active site of the DPP4 (4a5s) enzyme 30 .The root means square deviation (RMSD) value was obtained at 0.4 Å.The superimposition of the co-crystal ligand and docked N7F in the active site of 4a5s is shown in Fig. 2. The native ligand (N7F) was redocked back to the active site of the 4a5s enzyme to evaluate the docking protocol.Figure 2 shows the excellent superimposition of co-crystal ligand and docked ligand and it can validate the docking procedure.The RMSD of docking for N7F compared to its coordination in the crystal structure was very low value (0.31) and showed the reliability of the docking procedure The RMSD of docking for N7F compared to its coordination in the crystal structure was very low value (0.31) and showed the reliability of the docking procedure.
The Receiver Operating Characteristic (ROC) diagram is a graph to validate the ability of docking to discriminate between active and decoy compounds.ROC curve shows the true positive rate (sensitivity) vs false positive rate (1-specificity).The result of ROC is depicted in Fig. 3, AUC of 0.918 was obtained based on binding affinity scores and shows the success of the docking protocol implemented in distinguishing actives from inactive structures.
The 3D, and 2D structures and interactions of internal ligands of DPP4 (N7F) are shown in Fig. 4. Based on the results, the binding energy for N7F was obtained at − 10.5 kcal mol −1 and established the tight bindings in the active site of 4a5s through conventional hydrogen bonding and π-π interactions.The key residues in the binding site of DPP4 are Lys554, Trp627, Trp629, Tyr547, Tyr666, Tyr 631, Glu206, Glu205, Tyr662.www.nature.com/scientificreports/Among the 200 docked compounds, 30 compounds showed binding energy in the range of − 8 to − 10.8 kcal mol −1 , and amongst, the compounds with the highest binding energy (> − 10.0 kcal mol −1 ) were considered.The docking results of top-ranked compounds and Sitagliptin in the active site of 4a5s are shown in Table 1.As can be observed the binding energies for ZINC000003015356, ZINC000002876364, and ZINC000003007727 are − 10.8, − 10.0, and − 10.1 kcal mol −1 .It can be understood that the binding energy of ZINC000003015356 was higher than other selected compounds, N7F and Sitagliptin as reference drugs.To select the lead compound, the interactions and key residues of ligands in the active site of 4a5s were considered and presented in Tables 1 and 2. The compound ZINC000003007727 formed a hydrogen bond with the residues of Arg125 and His740 and π-π stacking interactions with Tyr662, Tyr 666, and Trp629 residues.Also, ZINC000002876364 formed a hydrogen bond with His740 and Ser630 and participated in the π-stacking interactions with Tyr662, Tyr666, and Trp629 residues.The analysis of docking results for ZINC000003015356 displayed that this compound made the hydrogen bond interaction with the residues of His740, Arg125, and Tr547 and π-π stacking with Tyr662, Tyr666, Tyr547, Trp629 residues.The involved key residues in the active site of the enzyme for ZINC000003015356, namely Tyr662, Tyr666, Tyr547, and Trp629 were similar to N7F.Consequently, the ligand ZINC000003015356 with the highest docking score and appropriate interaction in the active site of DPP4 was selected as the proposed DPP4 inhibitor.The interactions and binding poses of ZINC000003015356 are given in Fig. 5a.For further analysis and comparison of the selected combination with drugs available in the market, docking of Sitagliptin was also done, and the results are shown in Fig. 5b.Sitagliptin interacted with Tyr662, Tyr 632, Trp 629, and His 740 as hydrogen bonding and His 740 and Trp 629 as π interactions.Analysis      www.nature.com/scientificreports/ of docking outputs for Sitagliptin showed that the key residues and their surrounding residues were similar to ZINC000003015356, which confirmed the suitable selection of lead compound.
As previously mentioned, one of the assumptions for the side effects of DPP4 inhibitors is interference with DPP8 and DPP9 enzymes.Therefore, docking studies of selective ligand, ZINC000003015356, were performed on DPP8 (PDBID = 7a3k) and DPP9 (PDBID = 6eor) receptors.The binding energies for ZINC000003015356 on DPP8 and DPP9 were obtained zero.As can be seen in Fig. 6, the proposed ligand does not interact with any of the DPP8 and DPP9 and is not placed in the binding pocket of these receptors.
In contrast, the molecular docking of FDA-approved compounds was done on DPP8 and DPP9 enzymes.the docking results of these compounds on DPP8 enzyme showed that all compounds are placed in the active site of 7a3k.The binding energy for Sitagliptin, Linagliptin, Vildagliptin, Saxagliptin were obtained: − 9.8, − 9.3, − 8.1, − 7.7 kcal mol −1 , respectively.Also, the results showed that all the compounds are not placed in the binding site of DPP9 and have no interaction with it, and their binding energy was obtained zero.The 3-D interactions of FDA-approved compounds on DPP8 (7a3k) and DPP9 (6eor) are shown in Fig. 7.It was already mentioned that one of the hypotheses of side effects related to current DPP4 drugs is their interaction with other receptors.The results of molecular docking confirmed that these compounds affect the DPP8 receptor, while the selected ligand (ZINC000003015356) did not affect receptors DPP8 and DPP9.This event can be a promising point for ZINC000003015356 as an appropriate lead DPP4 inhibitor.
Taken to gather, one promising compound (ZINC000003015356) established strong hydrogen bonds and hydrophobic interactions with important residues, Arg125, Asp545, Tyr547, His740, Glu206, and Ser 630 at the binding site of the DPP4 enzyme which is essential for selectivity 19,32 .ZINC000003015356 had better docking scores than that of the FDA-approved drugs for diabetes, Sitagliptin.Molecular docking results on DPP8 and DPP9 showed low binding energy and interaction with these enzymes.The ZINC000003015356 hit compound based on visualization results deserves to be as the best DPP4 hit compound.

Molecular dynamic simulation results
The molecular dynamic simulation was used to investigate the dynamic behavior of the studied systems to confirm the results of the molecular docking 33 .The MD study was run on the top-ranked docked DPP4 ligand, namely ZINC000003015356.The RMSD, RMSF, Rg, and number of hydrogen bonding are given in Fig. 8.The www.nature.com/scientificreports/RMSD calculation estimated the conformational fluctuations of the protein-ligand backbone atoms and stability of the simulated system during 200 ns of simulation time 34 .According to Fig. 8, after initial slight fluctuations due to kinetic shock, all systems reached to steady state.Almost all systems considered in molecular dynamics studies undergo an initial kinetic shock.The RMSD plot results showed that the chosen ligands in all systems reached a stable level, suggesting that the ligands exhibited consistent stability within the active site of the protein.
Therefore, it can be concluded that the ligand-protein complex causes no significant conformational changes in the protein structure.
The RMSF analysis determined the fluctuations of amino acid residues with the calculation of the average of their atoms 35 .Small RMSF values indicated less conformational changes of the protein in the complex with the ligand.The RMSF results of the complexes between positive references, cocrystal ligand (N7F), and ZINC000003015356 with DPP4 receptor are shown in Fig. 9. Six complexes displayed similar patterns and none of the amino acid residues had an RMSF greater than 0.4 nm in the active site of receptor.It was expected that  the loop, C-terminal, and N-terminal regions showed more fluctuations compared to the residues in the binding site and were involved in ligand-protein interactions.The RMSF analysis for ZINC000003015356 showed that key amino acids namely Tyr662, Tyr666, Tyr547, and Trp629 had small RMSF values.The RMSF plots for the unbounded and bounded protein confirmed binding of the ligand to the protein did not change the protein structure 36 .Furthermore, the residual fluctuations less than 0.4 nm, indicated that the ligand-protein complex had no significant effect on the protein backbone, which was consistent with the RMSD results.
The majority of peaks were identified for Vildagliptin, Saxagliptin, Sitagliptin, and Linagliptin in residues 180-210, which are not associated with the enzyme's active site.These findings suggest that essential interaction between the ligand and binding pocket may contribute to preserving the protein's stability.The ZINC000003015356 ligand compareed to positive references (Vildagliptin, Saxagliptin, Sitagliptin, and Linagliptin) shows no significant changes in RMSF value.
The level of protein size change or its compactness in binding to the ligand during the MD simulation run was calculated with the Rg parameter 37 .The lower values of Rg indicated the higher compactness of the protein-ligand complex and the stability of the system.As shown in Fig. 10, Rg changes for ZINC000003015356 was in the same range of positive references.Vildagliptin and Linagliptin showed lower Rg in the range of 3.09 and 3.10 nm.
The examination of the total hydrogen bond interaction between each ligand with 4a5s during the simulation was also computed in Fig. 11.According to the results, the range of H-bonds for ZINC000003015356 was between 0 and 8, and for N7F was between 0 and 4 in the simulation time.The positive references showed good H-bond interaction in the active site of the protein.Therefore, the selected ligand formed a higher number of hydrogen bonds compared to the internal ligand through simulation time.In general, the analysis results of the MD simulation agreed with the docking study results.
Principal component analysis (PCA) or essential dynamics was one of the advanced methods in MD simulations 38 .For instance, biological macromolecules with numerous degrees of freedom, like proteins, may undergo significant conformational changes to adopt complex shapes and demonstrate a wide range of activities 39 .The examination of significant coordinated movements occurring during ligand binding is facilitated by the PCA analysis depicted in Figs. 12 and 13.In this study, the eigenvectors were acquired via performing diagonalization upon the matrix.Figure 12 shows the eigenvalues produced from the diagonalization of the covariance matrix of atomic fluctuations against the appropriate eigenvector in decreasing order.Based on Principal Component Analysis (PCA) findings, it can confidently be concluded that the six investigated complexes exhibit less motion and maintain a stable interaction.Another method for determining the dynamics of protein-ligand interactions  is to create 2D projection plots using PCA. Figure 13 presented a two-dimensional projection of the trajectories in phase space for the first three main components, PC1, and PC3, for all complexes.Through their small occupancy in the phase space, as indicated by the PCA analysis along with additional molecular dynamics (MD) simulations, all the examined complexes demonstrated high stability according to these findings, which are consistent with the reported data from the 2D PCA representations.

Free energy calculations
The binding free energies of the DPP4 selected compound and co-crystal are indicated in Table 3.According to the results, the binding free energies for selected ligand ZINC000003015356 was − 151.108 ± 11.170 kJ mol −1 was better than their related co-crystal ligand.As can be seen in Table 3, for ZINC000003015356, a large contribution of the total free energy (ΔG binding ) was Van der Waals energy, and the electrostatic energy had a small contribution.These results proved that the binding site of DPP4 is mainly non-polar, which is consistent with the docking results.The solvent-accessible surface area (SASA) indicated the degree of solvent exposure 40 that is reported in Table 3.The results of total binding energy for ZINC000003015356 proved that this compound is a suitable candidate as a DPP4 inhibitor.

Pharmacokinetic profile
Recently researchers used some techniques to predict the drug ability of hit compounds such as the Lipinski rule of 5 (Ro5) and the ADME/T parameters 41 .Three parameters of Ro5 including molecular weight, hydrogen bond acceptor, and hydrogen bond donor, are related to the interaction of the ligands and the active site of the protein.The lipophilicity parameter (LogP) can be calculated experimentally unlike the other three parameters which are not related to the biophysical features of the target.Topological polar surface area (TPSA) indicates the ability of a compound to penetrate cells 42 .The ability of compounds to penetrate the cell membrane is reduced when the TPSA value is higher than 140 (Å 2 ) 43 .
By using Swiss ADME and PreADMET software, the physicochemical and pharmacokinetics features of selected ligands and some of the FDA-approved DPP4 inhibitors were investigated.The amount of molecular weight, LogP, hydrogen bond donor, hydrogen bond acceptor, number of rotatable bonds, and topological polar surface area for ligands ZINC000003015356, ZINC000002876364, and ZINC000003007727 were in an acceptable range (Table 4).
Predicting the interaction between drugs and the human body is shown by parameters such as HIA%, in vitro plasma protein binding%, and blood-brain barrier (BBB%) parameters that are related to the distribution of the drug in the body.The HIA% value (91.35-97.94%) of selected ligands indicated the appropriate intestinal absorption.Besides, these ligands were desired to permeate the cell membrane, and the in vitro Caco-2 cell permeability for them was in an acceptable range.The negative values for in vitro skin permeability indicated that these compounds had no skin diffusion.Based on the results ZINC000003015356 showed a high affinity for binding to the plasma   5) 44 .
The obtained results for the toxicity profile of the selected ligands and some current DPP4 inhibitors are given in Table 6.The toxicity profiling for all studied compounds showed low toxicity against algae_.The Ames test values for all compounds indicated that these compounds were mutagens.The outputs of the carcino_mouse test exhibited that the studied ligands were not carcinogenic.Since blocking the hERG K + channels caused QT interval prolongation, subsequent sudden death may happen.Thus, this parameter was considered which showed medium risk for the studied compounds.In general, ZINC000003015356, in comparison with FDA-approved drugs, had no significant toxicity, therefore can be used as a proper candidate for inhibition of DPP4 enzyme.

DFT analysis
The molecular orbitals, HOMO and LUMO, predicted reactivity and the physical, structural properties of compounds.The HOMO, LUMO energies and, energy gap between HOMO and LUMO are depicted in Fig. 14.The gap energies of Sitagliptin and ZINC000003015356 were obtained at 3.75 and 4.77 eV respectively.The results showed that the selective DPP4 inhibitor had higher chemical reactivity than Sitagliptin due to the lower energy gap between HOMO and LUMO.The charge distribution of Sitagliptin and ZINC000003015356 is shown by electrostatic surface potential (ESP) energy (Fig. 15).The red spheres on the ESP graph represent the negative charge sites.The ESP map specified the balanced charge distribution in the ZINC000003015356 which facilitates the binding of the compound to biological enzymes.The global parameters are calculated at the B3LYP level and are shown in Table 7.The Hardness, softness, and electron affinity of ZINC000003015356 were obtained by using HOMO and LUMO energies.The values of entropy, enthalpy, and Gibbs energies showed more stability of Sitagliptin which was consistent with calculated HOMO, and LUMO energy values and energy gap.

Conclusion
This study aimed to find a new high-affinity inhibitor for DPP4 by using computational methods.Using the ZINC database, target compounds were screened on ≥ 50 similarities to FDA-approved drugs.Thus, by performing molecular docking, compound ZINC000003015356 with the highest docking score and appropriate interaction with the active site of DPP4 was selected.Also, the molecular docking studies of ZINC000003015356 on DPP8 and DPP9 receptors indicated that this compound had no interaction with these proteins.Whereas, the docking results for FDA approved drugs on DPP8 and DPP9 enzymes represented that these compounds are placed in the binding site of DPP8 and have no interaction with DPP9.The molecular dynamics simulation and MM/PBSA calculations were done for a selected compound that confirmed the results of the molecular docking.According to simulation analysis, the key residues of Tyr662, Tyr666, Tyr547, and Trp629 were concluded necessary for a ligand to be a potent inhibitor of the DPP4 enzyme, subsequently, the results of pharmacokinetic properties and ADMET profile showed that the selected compound was a suitable candidate for drug likenesses.DFT analysis showed the high reactivity of the proposed ligand.To conclude, ZINC000003015356 can be considered a promising and selective DPP4 inhibitor candidate.

Figure 2 .
Figure 2. The superimposition of co-crystal and docked ligand in the active site 4a5s.

Figure 4 .
Figure 4. Interaction and orientation of co-crystal ligand in the binding pocket of DPP4 (pdb:4a5s).

Figure 5 .
Figure 5. Interaction and orientation of (a) selected ligand ZINC000003015356, (b) Sitagliptin in the active site of DPP4.

Figure 8 .
Figure 8. RMSD backbone for protein complexes of different ligands within DPP4 inhibitors.

Figure 9 .
Figure 9.The RMSF plot of the complexes between positive references, cocrystal ligand, and ZINC000003015356 with DPP4 receptor.

Figure 10 .
Figure10.The Rg for DPP4 receptor in complex with positive references, cocrystal ligand, and ZINC000003015356 compounds during the simulation time.

Figure 11 .
Figure 11.The total number of hydrogen bonds of ZINC000003015356, co-crystal ligand N7F, and positive references.

Figure 12 .
Figure 12. plot of the eigenvalue rank of all complexes.

Table 1 .
The Docking score of some of the docked compounds in the active site of DPP4 (pdb:4a5s).

Table 2 .
The hydrophobic π-π and hydrogen bond interactions for top-ranked compounds in the active site of DPP4 (pdb:4a5s).

Table 3 .
Binding free energy (KJ mol −1 ) for the selected compound of DPP4 and its related co-crystal.

Table 4 .
Physicochemical properties for selected DPP4 ligands, their co-crystals, and some FDA-approved drugs.

Table 5 .
In silico ADME for selected DPP4 ligands, their co-crystals, and some of the FDA-approved drugs.