Py-CoMFA, docking, and molecular dynamics simulations of Leishmania (L.) amazonensis arginase inhibitors

Leishmaniasis is a disease caused by a protozoan of the genus Leishmania, affecting millions of people, mainly in tropical countries, due to poor social conditions and low economic development. First-line chemotherapeutic agents involve highly toxic pentavalent antimonials, while treatment failure is mainly due to the emergence of drug-resistant strains. Leishmania arginase (ARG) enzyme is vital in pathogenicity and contributes to a higher infection rate, thus representing a potential drug target. This study helps in designing ARG inhibitors for the treatment of leishmaniasis. Py-CoMFA (3D-QSAR) models were constructed using 34 inhibitors from different chemical classes against ARG from L. (L.) amazonensis (LaARG). The 3D-QSAR predictions showed an excellent correlation between experimental and calculated pIC50 values. The molecular docking study identified the favorable hydrophobicity contribution of phenyl and cyclohexyl groups as substituents in the enzyme allosteric site. Molecular dynamics simulations of selected protein–ligand complexes were conducted to understand derivatives’ interaction modes and affinity in both active and allosteric sites. Two cinnamide compounds, 7g and 7k, were identified, with similar structures to the reference 4h allosteric site inhibitor. These compounds can guide the development of more effective arginase inhibitors as potential antileishmanial drugs.

www.nature.com/scientificreports/profile (TPP) for new antileishmanial chemotherapeutic agents includes activity against all species of Leishmania, efficacy in treatment of less than two weeks, preferably a once-daily oral drug, and safer than available treatment 9 .
In the pursuit of discovering novel drug candidates, molecular targets that are crucial for the survival of parasites have been thoroughly investigated to identify compounds that are both selective and less toxic 13 .One noteworthy target from this research is the arginase (ARG) enzyme, also known as L-arginine aminohydrolase (EC 3.5.3.1), which has shown promise as a potential target in developing new antitrypanosomal drugs 4,[14][15][16] .
ARG is a manganese-dependent metalloenzyme that catalyzes the hydrolyzes of L-arginine into L-ornithine and urea, while the L-ornithine product is a precursor of polyamines necessary to produce trypanothione, a vital antioxidant agent that controls reactive oxygen species (ROS) [17][18][19][20][21] .This makes L-ornithine a crucial component of the redox defense mechanism of parasites.The expression and activity of ARG in Leishmania species contribute to a higher infection rate and play a vital role in the pathogenicity of the parasite 4,14 .An increase in ARG activity reduces L-arginine availability to the nitric oxide synthase (NOS) enzyme 4 .This, in turn, diminishes the formation of nitric oxide (NO), which lowers the host's defensive capacity and increases the parasite's infectivity.Simply put, the ARG enzyme can negatively regulate the levels of NO produced in the body by consuming the NOS enzyme substrate.It is worth noting that scientific studies have shown that ARG inhibitors can reduce the ability of Leishmania spp. to establish an infection in macrophages 14 .
Computer-aided drug design (CADD) techniques, including ligand-based drug design (LBDD) and structurebased drug design (SBDD) approaches, are widely applied in the drug discovery process 22 .The field of quantitative structure-activity relationships (QSAR) modeling, whether through regression or classification methods, aims to correlate chemical structures with their biological responses, and it is mostly applied as an LBDD approach 23,24 .However, the traditional approach of using topological (1D and/or 2D) molecular descriptors alone does not consider the tridimensional (3D) geometric features of molecules, which can lead to difficulties adequately describing interactions between ligands and receptors 25,26 .Therefore, a tridimensional QSAR (3D-QSAR) method can be applied to overcome these limitations for better results.
Comparative Molecular Field Analysis (CoMFA) 27 is a 3D-QSAR ligand-based and alignment-dependent method developed by Cramer et al. 27 that utilizes statistical techniques to correlate molecular (steric and electrostatic) interaction fields (MIFs), which are interaction energies calculated with virtual probes in a 3D space, such as in the GRID method developed by Goodford 28 with biological response endpoints 29 .Ragno et al. (2020)  developed a Python implementation of the CoMFA method, named Py-CoMFA, which is publicly available through the www.3d-qsar.com web portal to build 3D-QSAR models 30,31 .The resulting accuracy data can help understand the (quantitative) structure-activity relationships (Q)SAR that are most relevant for developing antileishmanial agents 32,33 .
On the other hand, a SBDD approach, such as the ligand-receptor molecular docking 34 and molecular dynamics (MD) simulations methods 35 , combined with a LBDD approach, such as the CoMFA method (3D-QSAR) 36,37 , have paved the way for significant advancements in drug design.These computational tools have enabled researchers to study bioactive substances' molecular mechanisms of action and have become a crucial strategy for identifying new hits 38 .In addition, molecular docking is necessary to understand the structural requirements and to analyze the interactions between ligands and receptors 39 , and in addition, to derive ligand-receptor complexes for further studies, such as molecular dynamics (MD) simulations, which may provide information about the stability of promising compounds analyzing the ligand-receptor complexes in the aqueous system 40 .
Furthermore, it is worth highlighting that this study aims to apply the Py-CoMFA (3D-QSAR) modeling method to a set of 34 compounds recently reported in the literature from 2018 to 2023 as inhibitors of the Leishmania (L.) amazonensis arginase (LaARG) enzyme, a relevant target of the parasite life cycle.During the cited period, there was a shortage of literature publications that incorporated a complete in silico study with multiple techniques such as 3D-QSAR (especially Py-CoMFA), molecular docking, and MD simulations for investigating arginase inhibitors, principally those focused targeting arginase from L. (L.) amazonensis, which has a greater ability to resist macrophages, neutrophils, and leishmanicidal drugs than other species of Leishmania 41,42 .After that, structure-activity relationship studies helped us to analyze the influence of aliphatic groups, with or without heteroatoms, and the presence of the cyclohexyl, pyrrole, and indole rings as side chains.Finally, the putative binding modes of the most active inhibitors were investigated by molecular docking on the LaARG enzyme, while the most relevant protein-ligand complexes were submitted to molecular dynamics simulations to verify the dynamic behavior of the putative interaction modes and binding affinities of those derivatives.
Considering a traditional qualitative SAR analysis, it has been observed that cinnamic acid (6) and its derivatives, cinnamides (4a-k) and cinnamic esters (5a-f), are the most promising compounds that act as arginase inhibitors.Their IC 50 values range from 1.3 to 200 µM.Among these compounds, cinnamic acid itself (6) and disubstituted cinnamide derivatives with acetylated groups (4i, 4g) or hydroxy or dihydroxybenzene (4a-4f) have been shown to have the strongest inhibitory activity (Fig. 1).Another group of inhibitors that exhibit exceptional inhibitory activity are chromones (1a-1d).These are also substituted with dihydroxybenzene with an IC 50 range of 1.4 to 9 µM.It is evident that among these compounds, β-carbonyl hydroxylation significantly reduces the inhibitory activity of chromones 1b-d by up to six times.The non-β-carbonyl hydroxylated derivative 1a, on the other hand, exhibits the highest inhibitory potency (IC 50 = 1.4 µM) (Fig. 1).Nitrogen derivatives like pyrimidines (2a-f), phenylhydrazines (3a-e), and phenylamide (3f) exhibit moderate inhibitory activity with IC 50 values ranging from 12 to 100 µM.However, the biological activity appears to improve when the phenylhydrazine nucleus is included in the structural framework, with values ranging from 12 to 38 µM (Fig. 1).
It is important to keep in mind that Py-CoMFA models, as with other 3D-QSAR methods, can be affected by various factors that can impact their accuracy.Some of these factors include the conformational analysis method, putative bioactive conformation selection, and alignment rule, which is unfavorable to noncovalent molecular interactions 48 .Therefore, ensuring that the molecular target being evaluated in the model is carefully studied is crucial the probe atom variations need to be assessed 49 , and the set of molecules should be well-defined to obtain reliable results.It is important to note that hydrophobicity is not well quantified 50 , and thus, structures with high hydrophobicity can lead to misinterpretations.Additionally, the use of CoMFA is appropriate only with in vitro data.
In this way, we first performed in the 3D-QSAR server (https:// www.3d-qsar.com/), a conformational analysis of all inhibitors using the RDkit method with the UFF force field through the Py-ConfSearch module.The most extended conformation of each inhibitor was selected as the putative bioactive conformation, since in several crystal structures of L. mexicana arginase (LmARG), a series of boronic amino-acid inhibitors (PDB ID: 5HJ9, 5HJA, 4IU0, 4IU4) adopts an extended conformation.The same occurs for a substrate analog amino-acid inhibitor (PDB ID: 4IU1) and the L-ornithine product (PDB ID: 4IU5).Then, we aligned the inhibitors (Fig. 2A) using the Py-Align module, applying the RDkit method using the most extended conformation of the most active inhibitor 4h (IC 50 = 1.3 µM) as an alignment template (Fig. 2B).
Once the dataset compounds are aligned, we utilize two methods to split the dataset into training and test sets.While most QSAR studies, including CoMFA, generally apply random splitting, we also wanted to investigate the potential differences when implementing a supervised approach, such as the Kennard-Stone method 51 .Twenty-five compounds were used as the training set for deriving the QSAR models, and nine compounds were used as the test set for external validation (Tables 1, 2).Compared to the random training and test sets splitting method, we found that the Kennard-Stone 51 splitting method provided the compounds that show high dissimilarity in relation to the independent variables (descriptors) that were separated as a test set and the best parameters for our models.We explored three different fields, namely steric (STE), electrostatic (ELE), and the combination of both (STE + ELE) (see Table S1).However, we found that the models based on steric fields (STE) provided the best results (as shown in Table 1), and hence, we conducted all further analyses based on these models.
We also tested the models with different probe atoms and charges (Csp 3 , + 1; Osp 3 , − 1; and H, + 1) and found that all models performed well, showing statistical parameters indicating that the coefficient of determination (r 2 ) was more significant than 0.9 for all models.In contrast, the r 2 from the leave-one-out cross-validation (q 2 ) was greater than 0.5 for models from the Kennard-Stone splitting method, considering Csp 3 (+ 1) and Osp 3 (− 1) as probe atoms, in addition to a Pearson correlation coefficient with positive value (r ≅ 1), i.e., variables are directly correlated 52 .However, the best model was for Csp 3 (+ 1) as a probe atom, with higher r 2 = 0.999 and q 2 = 0.573 (Table 1).We also performed a Y-randomization test, which showed low values of q 2 , indicating that our developed 3D-QSAR models with original data are robust and not inferred by chance (Table 1).
Alignment dependence and conformational sensitivity are problems that need to be considered when using Py-CoMFA (3D-QSAR) 50 .In this way, since the Kennard-Stone splitting method using Csp 3 (+ 1) as a probe and STE field produced the most favorable statistical results, we decided to assess the model using docking conformations (Figure S1) in the alignment for QSAR modeling.Despite the model being based on the STE + ELE fields combination presenting great results (Table S2, S3, and Figure S2), the Kennard-Stone splitting method, Table 1.Statistical parameters of the Py-CoMFA (3D-QSAR) models built using steric field (STE) and 2 Å grid spacing, including Y-randomization (Y-r) test, according to the training and test sets splitting method (random or Kennard-Stone) and the probe atoms and charges (Csp 3 , + 1; Osp 3 , − 1; and H, + 1).All models presented a P value < 0.0001.r 2 = coefficient of determination; SDEC = standard deviation error in calculation; q 2 = r 2 from leave-one-out (LOO) cross-validation; SDEP = standard deviation error in prediction; r = Pearson correlation coefficient.www.nature.com/scientificreports/with Csp 3 (+ 1) as the probe and STE field, continues to provide the most satisfactory outcomes for all statistical parameters evaluated.In this way, Fig. 3 shows the plots of the experimental (pIC 50Exp ) and predicted (pIC 50Pred ) biological response values for the training and test sets compounds, according to the Py-CoMFA models built using the STE field, 2 Å grid spacing, and the Kennard-Stone splitting method, showing the best performance for the probe atoms Csp 3 (+ 1) and Osp 3 (− 1) compared to the H (+ 1) probe atom.In fact, the Py-CoMFA models had a squared linear correlation coefficient (R 2 ) of 0.861 to Csp 3 (+ 1) (Fig. 3A; y = 0.9086x + 0.467; R2 = 0.8608), 0.843 to Osp 3 (− 1) (Fig. 3B; y = 0.8995x + 0.508; R2 = 0.8436), and 0.712 to H (+ 1) (Fig. 3C; y = 0.8144x + 0.9211; R2 = 0.7123) as probe atoms.
Table 2 presents the predicted pIC 50 and residual (Res = pIC 50Exp − pIC 50Pred ) values calculated with the Py-CoMFA models, considering the Kennard-Stone splitting method, for the training and test set inhibitors during Table 2. Experimental pIC 50 (pIC 50Exp ), predicted pIC 50 (pIC 50Pred ), and residual (Res = pIC 50Exp − pIC 50Pred ) values for the training and test sets compounds based on the Py-CoMFA models built using steric field, 2 Å grid spacing, and the Kennard-Stone training and test sets splitting method, as probe atoms Csp 3 (charge = + 1), H (charge = + 1), and Osp 3 (charge = − 1).Res = Residuals *Test set compounds.

Classes pIC 50Exp
Csp  2).The lowest residual values between experimental and predicted pIC 50 were for the models from Csp 3 (+ 1) and Osp 3 (− 1) and the higher ones for the H (+ 1) atom probe (Table 2).Although the H (+ 1) probe atom model presented the worst performance, the three models are highly correlated, since the residuals of all three models showed a strong Pearson correlation with Csp 3 versus Osp 3 (r = 0.96), Csp 3 versus H (r = 0.92), and Osp 3 versus H (r = 0.92).
The Py-CoMFA Csp 3 (+ 1) contour maps of the steric MIF of the most active compound 4h showed favorable (green) fields on the phenyl and amide group, indicating that bulk substituents on those positions could enhance the protein-ligand interactions.However, it is essential to note that substitution in the nitrogen atom of the amide group has unfavorable (yellow) fields, particularly for bulk substituents such as phenyl group (Fig. 4).
Based on the information obtained from the contour maps of the best Py-CoMFA predictive model, 23 new compounds (7a-w) were designed, maintaining the general cinnamide structure (Fig. 5).In the para-position of the phenyl ring, R 1 , we tested substituent groups (R 1 = H, CH 3 , OH, NH 2 , OCH 3 , F, and O(CO)CH 3 ) with different degrees of inductive and/or resonance effect as electron donor or acceptor.As a substituent at the nitrogen atom of the amide group, we tested H, aliphatic groups, with or without heteroatoms, and with cyclohexyl, pyrrole, and indole rings (Fig. 5).The Py-CoMFA (Csp 3 , + 1) model was used to predict the pIC 50 values of the newly designed compounds (7a-w).Out of the 23 compounds analyzed, we selected six that had predicted pIC 50 values greater than 5.00 M based on the activity of the reference compound 4h (pIC 50 = 5.886 M).Although all compounds did not present a predicted biological activity, value greater than the reference compound, interestingly, the highest pIC 50 values were observed at 7g (pIC 50Pred = 5.169 M) and 7k (pIC 50Pred = 5.169 M) with H at R 1 position, and indole and cyclohexyl substituents in the R 2 position, respectively (Table 3), while the pyrrole ring did not contribute to the activity.Among the aliphatic substituents, the best four compounds were 7o, 7p, 7q, and 7b with acetyl, methyl(thio)ethyl, methoxy-ethyl, and amino groups, respectively.Aliphatic side chains with no heteroatoms or hydroxyl groups did not contribute to the activity as well as para-position substitution of the phenyl ring in R 1 , with electron donor or acceptor groups, since pIC 50 values were lower than unsubstituted derivatives (Table 3).

Applicability domain
To assess the applicability domain, we employed two methods: k-nearest neighbors (kNN) using Euclidean distances as metric and Leverage [53][54][55] .The kNN methodology is centered on assessing data points' similarity through their feature vectors, with similarity being defined by the distance in the feature space, often computed using the Euclidean distance metric.It specifically recognizes the k-nearest neighbors of a query data point and employs them for predictive purposes or data classification 53 .
In contrast, the leverage technique appraises the impact of individual data points on the dataset's overall comprehension.It computes leverage metrics for each data point, signifying its influence on the model's analyses or predictions 53 .
Whereas kNN underscores similarity and neighborhood associations, the leverage approach underscores the significance and sway of individual data points within the dataset.Consequently, while kNN aids in clustering akin data points, the leverage method assists in pinpointing influential or anomalous data points that could considerably impact model performance or comprehension.
Upon evaluating the limits established by kNN (3.02) and Leverage (0.24) for the training set, we found no outlier compounds in the external set, indicating that its chemical space aligns closely with the training set's (Fig. 6) 56,57 .

Molecular docking
The constructed Py-CoMFA (3D-QSAR) model was based on compound 4h as an alignment reference.This compound has been reported in the literature for its mixed-type inhibition 4 , which means it can bind to allosteric sites both to the free enzyme or enzyme-substrate complex.On the other hand, compound 4e is reported as a competitive inhibitor 4 binding to the active site of the free enzyme.As compounds 7g and 7k share a similar structure with the reference compound and were chosen due to their predicted pIC 50 activity, we aimed to investigate their behavior in both the potential allosteric sites and the active site of the LaARG protein.Our goal was to compare their behavior with the reference inhibitors 4e and 4h, respectively.
The FTMap 58 server was used to detect hot spot residues and identify potential drug-binding sites, including the active site, resulting in two clusters of interest (for more details, please refer to the Supporting Information).The first cluster corresponds to the active site region (amino acid residues detected as hot spots: His102, His127, Asp125, Asp129, His142, Asp231, and Asp233), including two Mn(II) cations, which are presented in a zoom view within a green color (Fig. 6).The second cluster corresponds to an allosteric site region (amino acid residues detected as hot spots: Lys1, Lys2, Met3, Ser4, Trp40, Phe94, Leu96, Leu269, Val270, and Met302), which is presented in a zoom view within an ochre color (Fig. 7).This allosteric site corresponds to a cavity  composed of residues from one of the monomers close to the protein-protein interface and in a region opposite the catalytic site.
In the first instance, we investigated the potential interaction of derivatives 7g and 7k at the enzyme's active site by molecular docking.As previously mentioned, compound 4e is described as a competitive inhibitor 4 , so both compounds, 7g and 7k, were docked in the first cluster corresponding to the active site of the LaARG model (Fig. 8).In the best docking pose of 7g within the LaARG active site (Fig. 8A), a strong charge-assisted H-bond interaction is observed between the NH group (neutral donor) of the indole ring of 7g and the carboxylate group (ionized acceptor) of Glu185, while this aromatic indole ring also performs a pi-pi T-shaped interaction with the aromatic imidazole ring of His127.Meanwhile a pi-alkyl interaction is observed between the phenyl ring of the cinnamide moiety of 7g and Pro246.Likewise, in the best docking pose of 7k within the LaARG active site (Fig. 8B), an H-bond interaction is observed between the cinnamide NH group and the N3 atom of the  imidazole ring of His127, while the cinnamide phenyl ring performs a pi-pi interaction with the imidazole ring of His142, as well as cyclohexyl methyl substituent at the R 2 position performs an alkyl-alkyl interaction with Pro246 (Fig. 8B).The inhibitor 4e had a p-OH substituent at the R 1 position.This allowed for a H-bond interaction with Glu185.Two more interactions were observed with the amino acids His127 and Gly244.Similar to compounds 7g and 7k, there was also an interaction between Pro246 and the 3,4-dihydroxyphenyl ring (Fig. 8C).
Regardless of the types of interactions of these compounds, by the rings as substituents or with the cinnamide moiety, it is worth highlighting that compounds 7g and 7k showed the best-predicted pIC 50 values of 5.169 M and 5.151 M, respectively.These compounds occupy the same region within the active site at approximately 6.6 (7g) and 7.8 Å (7k) from Mn 2+ ions, interacting mainly with the same His127 and Pro246 residues, presenting only an inverted head-to-tail orientation in relation to each other, in addition to a similar binding mode compared to reference inhibitor 4e (Fig. 8D).Those interactions were important for the ligand binding on the enzyme's active site, which may reflect in the inhibition of the arginase enzyme.
Afterward, we investigated the potential interaction of compounds 4h, 7g, and 7k into the second cluster of LaARG, corresponding to an allosteric site, by molecular docking (Fig. 9).In the best docking pose of 4h within the LaARG allosteric site (Fig. 9A), H-bond interactions are observed between the cinnamide NH group and the carbonyl oxygen atom of the Ala264 main chain, while the cinnamide phenyl ring performs pi-alkyl, and pi-sulfur interactions with the side chain groups of Leu269 and Met303, respectively.In addition, the 3,4-dihydroxyphenyl ring of 4h performs H-bond interactions with the oxygen atom of Glu265 (Fig. 9A).
In the best docking pose of 7g (Fig. 9B), an H-bond interaction is observed between the cinnamide NH group and the carbonyl oxygen atom of the Ala264 main chain likewise 4h.The cinnamide phenyl ring performs cationpi, pi-alkyl, and pi-sulfur interactions with the side chain groups of Lys1, Leu269, and Met303, respectively.In addition, the indole ring of 7g performs two interactions, an amide-pi stacking with Glu265 and a pi-alkyl with Ala264 (Fig. 9B).
In the best docking pose of 7k (Fig. 9C) there is an H-bond interaction between the cinnamide NH group and Leu269, while the cinnamide phenyl ring performs two pi-alkyl interactions with the side chains of Ala89 www.nature.com/scientificreports/and Val270 (Fig. 9C).In general, compounds 4h and 7g bound similarly (Fig. 9D) to the allosteric site, while 7k bound closer to the protein-protein interface of the enzyme trimer, i.e., between two monomers.

Molecular dynamics simulations
Subsequently, we conducted classical molecular dynamics (MD) simulations in aqueous systems with the best docking poses of 4h within the allosteric site, 4e within active site, and 7g and 7k within both active and allosteric sites of the LaARG model to verify the dynamic behavior of the putative interaction modes and binding affinities of those derivatives during 200 ns, using the GROMACS software 59 with Charmm36 force field 60 .

Molecular dynamics simulations in the active site
The stability and persistence of interactions were evaluated by RMSD (root-mean-square deviations) of the ligand atoms and Cα atoms of the amino acids in the protein-ligand complexes throughout the MD simulations (Fig. 10A-C).During the RMSD ligand atoms analysis of simulation conducted on the active site, it was observed the reference inhibitor 4e moved out of the active site within the first nanoseconds of the MD (Fig. 10A).It then exhibited movement between the 75 and 125 ns, before returning to its initial position and binding to the active site in the final phase of the simulation.This resulted in a high value of RMSD and standard deviation (SD) of 23.30 ± 16.7 Å. Compound 7g, on the other hand, showed a tendency to move away after 10 ns, bound to the surface of the allosteric site from 50 to 100 ns before leaving again (Fig. 10B).The mean RMSD value was 28.4 ± 10.6 Å and similar to 4e.Compound 7k was observed to leave the active site at the beginning of the interacting by pi-alkyl (dashed yellow lines), pi-sulfur (dashed purple lines), cation-pi (dashed green lines), or amide-pi stacked (dashed pink lines).
The root-mean-square fluctuation (RMSF) during the 200 ns of simulation was used to evaluate the residue's mobility.The LaARG model residues with the highest observed RMSF values were grouped into four main groups (Fig. 11D).The first group comprises the residues Gln14-Tyr25 (colored in yellow, Fig. 11A-D), which are located around the active site (in green, Fig. 11D), approximately 10 Å from the two Mn(II) cations, and presented RMSF values of 2.3 to 3.0 Å.The second group comprises the residues Gly50-Ala75 (in red, Fig. 11A-D), which are close to the protein surface and their RMSF values varied from 3.72 to 4.9 Å.The third group comprises the residues Pro225-Gly250 (in blue, Fig. 11A,B,D), which are located next to the active site and present RMSF values between 2.0 and 2.9 Å.The fourth group is related to residues Val275-His287 (in purple, Fig. 11A and D), which is below the active site, with considerable RMSF values of around 3.0 Å.It was evident that the greatest fluctuations contribution was from the 4e inhibitor, despite the planned compounds 7g and 7k also showing similar values in the surrounding residues near the active site.
The analysis of inhibitor 4e's H-bond interactions revealed that it has the potential to inhibit the target protein by interacting with specific residues such as Asp129, Glu185, Leu269, and Met202 present in the active site (Fig. 12A).The lifetime of these interactions ranged from 9.18% to 34.6% (Fig. 12A), although the RMSD analysis pointed out a lack of interactions between the period of 75 to 125 ns.The H-bond analysis in the active site of the LaARG-7g and LaARG-7k complexes revealed that these intermolecular interactions involving the ligands and certain residues, such as Gly66 and Thr310 (as shown in Fig. 12B) and Arg262 (Fig. 12C), had a short lifetime (5% to 15.9%) and were located away from the initial binding poses (active site), as pointed out by RMSD analysis.
We calculated the binding free energy (ΔG bind ), using the MM-PBSA method, for ligands that remained bound to the active site of the LaARG model during the MD simulations.The ΔG bind values reflect that 4e (− 17.8 kcal/ mol) and 7g (− 20.6 kcal/mol) presented similar affinity for the active site (Table 4).Upon analysis, we also observed that the van der Waals and solvent-accessible surface values for cinnamates 7g and 7k are quite similar.However, we noticed that the difference in ΔG bind values can be attributed to the higher solvation cost of 7k, which has substantially higher values of 15.1 kcal/mol when compared to 7 g with its lower value of 5.14 kcal/ mol.This clearly indicates the higher ΔG bind value observed (− 7.49 kcal/mol).www.nature.com/scientificreports/Molecular dynamics simulations in allosteric site Subsequently, we conducted MD simulations in aqueous systems with the best docking poses of 4h, 7g, and 7k within the allosteric site of the LaARG model.During the simulation, the compound 4 h displayed notable movement in the allosteric site after 10 ns and maintained its position throughout the rest of the simulation, with an RMSD value of 12.2 ± 2.07 Å, indicating a significant shift in its conformation (Fig. 13A).
On the other hand, RMSD analysis of 7g (Fig. 13B) indicated the persistence on the binding site until 125 ns of the simulation, showing slight movement between 50 and 125 ns inside the cavity before eventually dissociating with RMSD = 17.28 ± 14.13 Å.Like reference 4h, ligand 7k was found to have a strong affinity for the allosteric site (Fig. 13C) since the binding persisted almost throughout the entire period of the simulations, with only two slight movements observed between 80 and 112 ns and 185 ns, while the RMSD and SD values were low (4.34 ± 0.76 Å).
As observed in the Cα-RMSF analysis of the compounds in the active site, their presence in the allosteric site also caused fluctuations in the regions, mainly of protein loops, to residues Gln14-Tyr25 (colored in yellow, Fig. 14A,B,D) and Gly50-Ala75 (in red, Fig. 14A-D) closed to the protein surface, and Pro225-Gly250 next to allosteric site (in purple, Fig. 14B-D), ranging from 2.5 to 4.2 Å.In addition to these, two more regions presented high RMSF values, belonging to residues Leu150-Pro170 (in blue, Fig. 14A-D) and Val275-Arg300 (in orange, Fig. 14C-D  www.nature.com/scientificreports/About H-bond interactions observed in the allosteric site, the reference inhibitor 4h showed persistent interaction with residues Leu269 and Val270 with high lifetime values of 79.2% (and 8.27%) and 82.8%, respectively.In addition, also performs H-bond with Thr233 (24.4%) continually throughout the MD simulation (Fig. 15A).The derivative 7g interacted with residues Ala264 and Glu265 within the first 50 ns and a 5-19% lifetime.It also interacted with Gly267 from 50 to 125 ns, with a lifetime of 14%, before disconnecting from the enzyme (Fig. 15B).Notably, in arginase from Leishmania mexicana, a salt-bridge cluster (Glu277-Arg216) is known to stabilize the formation of the trimer, which is not present in the human arginase I 61 .In the LaARG, this salt-bridge is conserved by Glu265-Arg204 and could lead to a possible selectivity for the parasite enzyme.Despite their short lifetimes, these interactions at the allosteric site were equally persistent than those observed at the active site, suggesting that 7g may have affinity for both sites.On the other hand, 7k showed high affinity for LaARG's allosteric site, presenting lifetime values of up to 76% throughout the MD simulation, mainly with residue Leu269, as reference inhibitor 4h, and others, such as Lys1 and Val270 (Fig. 15C).
Finally, we calculated ΔG bind for ligands that remained bound to the allosteric (4h, 7g, and 7h) site of the LaARG model during the MD simulations (Table 5).It is more evident that cinnamide derivative 7k is promising for inhibition of LaARG in the allosteric site since it presented an equivalent ΔG bind value of − 16.6 ± 1.76 kcal/mol as reference inhibitor 4h (− 17.0 ± 2.14 kcal/mol), while derivative 7g showed lower ΔG bind = − 13.4 ± 1.95 kcal/ mol.
Based on our previous discussions, the binding free energy values also indicate the favorable hydrophobicity contribution of phenyl and cyclohexyl groups as substituents, probably due to the characteristic of several residues on the LaARG allosteric site, such as valine, leucine, phenylalanine, serine, methionine, and tryptophan.As a result, the 7k derivative with phenyl and cyclohexyl have stronger interactions within this pocket also due to molecular volume very similar to 4h than the indole-substituted 7g, which is larger.
Finally, the highest energy cost of desolvation of the binding site was observed for cinnamides 4h (20.1 kcal/ mol) and 7k (17.3 kcal/mol) due to their substituents characteristics when compared to the indole-substituted 7g (5.62 kcal/mol), in addition to highest energy values of the van der Waals (4h, − 27.9 kcal/mol, and 7k, − 27.7 kcal/ mol) and electrostatic terms (4h, − 6.00 kcal/mol, and 7k, − 2.89 kcal/mol) required for a favorable interaction and resulting in their best ΔG bind values (Table 5).It was noted that ΔG bind analysis of the ligands was consistent, as previously emphasized by RMSD, RMSF, and intermolecular interactions via H-bonding, demonstrating the proposed cinnamide 7k promising as a mix or uncompetitive LaARG inhibitor.
In conclusion, it is worth noting that the H-bond interactions between the LaARG model and cinnamide derivatives (4e, 4h, 7g, 7k) occur through the amide group, which acts as a pharmacophore feature.However, the phenyl and cyclohexyl rings of proposed cinnamide 7k are crucial for interaction in the allosteric site, which is mostly made up of residues with aliphatic side chains and is more hydrophobic compared to the active site formed by aspartate and histidine residues and more adaptable to interactions with indole ring of the 7g.It is crucial to emphasize that these compounds should be synthesized and studied in vitro against the arginase molecular target.This approach is necessary to obtain precise and confirmatory details about their inhibitory mechanism of action.After analyzing the interactions evidenced by the molecular docking and molecular dynamics results, the two promising compounds, 7g and 7k, had some of their properties related to drug-likeness (oral bioavailability) and ADMET (absorption, distribution, metabolism, excretion, and toxicity) profile predicted in silico (Table 6).
In our in silico study, according to the drug-likeness predicted properties (MW ≤ 500; LogP ≤ 5; HBA ≤ 10; HBD ≤ 5; TPSA ≤ 140) (Table 6), we have found that 7g and 7k do not violate any of the Lipinski and Veber rules.Moreover, both compounds have the potential to be promising as they show predicted high absorption from the gastrointestinal (GI) tract (Table 6).This indicates that they can cross the cell barriers and enter the bloodstream to reach their biological target 62 .However, it is important to note that these compounds also have the potential to cross the blood-brain barrier (BBB), which may cause toxic side effects 62 (Table 6).www.nature.com/scientificreports/Compounds 7g and 7k were predicted as non-substrate of P-glycoprotein (P-gp) (Table 6), an efflux pump protein that impacts drug pharmacokinetics by carrying substances out of cells 63 .Therefore, 7g and 7k should not affect P-gp function, avoiding overdoses in combined therapies (Table 6).
Cytochrome P450 (CYP) enzymes are fundamental in the metabolism of drugs and xenobiotics 64 .According to our in silico predictions, 7g and 7k may have the ability to inhibit CYP1A2 and CYPC19.Additionally, 7g may also inhibit CYP2D6 and CYP3A4.Since the CYP1 enzyme family is responsible for metabolizing carcinogenic Table 5.The binding free energy (ΔG bind ) terms of the LaARG-ligand complexes, considering the allosteric site, calculated for 4h, 7g, and 7k with the MM-PBSA method (mean ± standard deviation energies; kcal/mol): van der Waals (ΔE vdW ), electrostatic (ΔE elect ), solvation (ΔE solv ), and solvent accessible surface area (ΔE sasa ).

Figure 1 .
Figure 1.Chemical structures and half-maximal inhibitory concentration (IC 50 , μM) values of the L. (L.) amazonensis arginase inhibitors used in the Py-CoMFA (3D-QSAR) study.The classes of compounds are highlighted by different colors.

Figure 2 .
Figure 2. (A) Superimposed structures of all inhibitors under study.Compounds are shown in ball-and-stick model, and the atoms are color coded as follows carbon (grey), oxygen (red), nitrogen (blue), fluorine (green), and sulfur (yellow), while hydrogen atoms were omitted for clarity.(B) Longest conformation of 4 h used as alignment template.Figure generated with Py-Align and Py-ConfSearch, respectively.

Figure 5 .
Figure 5.Chemical structures of the proposed cinnamide derivatives (7-aw) designed based on the interpretation of the Py-CoMFA (Csp 3 , + 1) model to calculate the predict pIC 50 values.

Figure 7 .
Figure 7.The active site (green color) and the putative allosteric site (ochre color) of the LaARG model were detected using the FTMap server (expanded for a monomer).

Figure 8 .
Figure 8. Best molecular docking poses of cinnamide derivatives 7g (A) and 7k (B) and reference 4e (C) within the active site of the LaARG model and the overlapped poses of 7g, 7k and 4e (D).The dashed black lines represent the H-bond interactions with amino acids (represented by spheres).Residues are represented by sticks (green color) interacting with pi-alkyl (dashed yellow lines), pi-pi T-shaped (dashed orange lines), or alkyl-alkyl (dashed blue lines).

Figure 9 .
Figure 9. Best molecular docking poses of cinnamide derivatives 4h (A), 7g (B), and 7k (C) within an allosteric site of the LaARG model and the overlapped poses of 4h, 7g, and 7k (D).The dashed black lines represent the H-bond interactions with amino acids (represented by spheres).Residues represented by sticks (ochre color)interacting by pi-alkyl (dashed yellow lines), pi-sulfur (dashed purple lines), cation-pi (dashed green lines), or amide-pi stacked (dashed pink lines).

Figure 10 .
Figure 10.RMSD (Å) analysis relative to the atoms of 4e (A), 7g (B), and 7k (C) and Cα-RMSD analysis of the amino acid relative to complex with 4e (D), 7g (E), and 7k (F) in the LaARG model by molecular dynamics simulations in aqueous systems (200 ns), considering the initial ligand positions within the active site.The red arrow highlights the ligand movement.

Figure 11 .
Figure 11.Cα-RMSF analysis relative to the molecular dynamics simulations in aqueous systems (200 ns) of the LaARG-ligand complexes of 4e (A), 7g (B) and 7k (C) in the active site; (D) 3D-structure of the LaARG model highlighting the active site (green) and residues groups purple, yellow, blue, and red with the highest RMSF values.

Figure 12 .
Figure 12.Active site H-bond lifetime (%) and representative interactions of (A) LaARG-4e, (B) LaARG-7g and (C) LaARG-7k complexes.The yellow circle in the 2D structure highlights the atoms responsible for the interactions.

Figure 13 .
Figure 13.RMSD (Å) analysis relative to the atoms of 4h (A), 7g (B), and 7k (C) and Cα-RMSD analysis of the amino acid relative to complex with 4h (D), 7g (E), and 7k (F) in the LaARG model by molecular dynamics simulations in aqueous systems (200 ns), considering the initial ligand positions within the allosteric (black) or active (red) site.The red arrow highlights the ligand movement.

Figure 14 .
Figure 14.Cα-RMSF analysis relative to the molecular dynamics simulations in aqueous systems (200 ns) of the LaARG-ligand complexes of 4h (A), 7g (B) and 7k (C) in the allosteric site; (D) 3D-structure of the LaARG model highlighting the allosteric site (ochre) and residues groups purple, yellow, blue, orange, and red with the highest RMSF values.

Table 6 .
In silico predicted properties related to drug-likeness (oral bioavailability