Introduction

TB (tuberculosis), an infectious disease is responsible for deaths of 1.5 million people every year throughout the world. TB is caused by pathogenic bacteria called “Mycobacterium tuberculosis1. Despite being a preventable infectious disease, millions of people die every year2,3. Most of these cases have been seen from low- and middle-income countries, but it has a profound presence throughout the world. TB is the main cause of HIV deaths and is being contributed to anti-Tb drug resistance4,5. WHO estimates the presence of one-quarter of the world’s population infected with TB1. As TB bacteria exist in the replicating and dormant forms, it becomes challenging to develop a novel anti-TB drug. Anti-Tb agents should act on both forms of the bacterium. Previously, we were just focusing on the developments of anti-TB drugs acting on the replicating forms, whilst it is also important to develop drugs acting and inhibiting the dormant forms of Mtb. Recently, there has been an emergence of MDR-TB and XDR-TB cases6,7,8,9,10,11,12,13,14,15,16,17,18,19,20.

The research carried out by Christophe et al. and Makarov et al.2 depicted significance of decaprenylphosphoryl-β-d-ribose 2′-epimerase (DprE1) as a new potential anti-TB target for active drug molecules against Mtb. Decaprenylphosporyl arabinose (DPA) plays a crucial role by acting itself as a substrate for the arabinosyltransferase12. This is a key step responsible for Mtb cell wall synthesis i.e. arabinogalactan and lipoarabinomannan. DprE1 is a flavoprotein responsible for the oxidation of DPR (decaprenylphosphoryl-d-ribose) to DPX (decaprenylphosphoryl-2-ketoribose), which further reported to reduce by the enzyme DprE2 to DPA. A recent review explains thoroughly the broad classifications of DprE1 inhibitors based on the modes of bindings with the enzyme as covalent and non-covalent DprE1 inhibitors12,13. In one of the study, it was found that 2-carboxyquinoxalines (non-covalent inhibitors) possess an essential 2-carboxylate moiety essential for formations of key hydrogen bonds with the side-chain of Lysine 418 and the hydroxyl group of Tyrosine 6014,15. There have been several reports for a variety of the chemical scaffolds acting as DprE1 inhibitors. This list includes but not limited to scaffolds like dinitrobenzamides, azaindoles, pyrazolopyridines, benzothiazinones, etc. Recently, benzothiazinones BTZ043, PBTZ-169/macozinone and the azaindole AZ7371 have been entered in clinical trials as inhibitors of DprE113,14,15,16,17,18,19,20. Some of the important inhibitors of DprE1 have been displayed in (Fig. 1). Nowadays, CADD (computer aided-drug designing) techniques are finding their ways in successful drug discoveries20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43.

Figure1
figure 1

Literature reported compounds targeting DprE1inhibitors.

In our present study, we have explored hydantoin as a new class of DprE1 inhibitors using atom based- as well as field-based-3D-QSAR studies. Thorough molecular modelling studies were carried out to generate pharmacophore hypothesis, and to identify new possible hits using ligand-based virtual screening, docking simulations, and molecular dynamics (online Supplementary, Figs. S1S15). We had also extended our study to identify ADMET properties of the dataset as well as for new hits using a variety of commercial (QikProp, V. 6.6, Schrodinger, LLC, NY, 2020) as well as non-commercial software tools (admetSAR, SwissADME, pred-hERG, etc.). Our generated hits might pave the new way towards the development of DprE1 inhibitors.

Materials and methods

Dataset used for the developments of pharmacophore, QSARINS (QSAR-INSUBRIA) and 3D-QSAR models

For the developments of pharmacophore and 3D-QSAR models, we have used previously reported dataset of 100 compounds (online Supplementary Table S1). This dataset has diversity among pharmacological and structural characteristics13. For conversion of the IC50 (µM) values to pIC50, we used well known conversion formula i.e. (pIC50 =  − log10 IC50). DprE1 pIC50 is the negative logarithm of the IC50-concentration expressed in molar (M) obtained in the DprE1-inhibition assay. All pIC50 values were further considered for modelling studies21. The same dataset was used for the developments of GA-MLR (genetic algorithm multiple linear regression) models using popular software QSARINS ver. 2.2.2 and validated both internally and externally43,44,45.

Software and hardware used

For our current study, we have utilized commercial Schrodinger’s small molecules drug discovery package installed on Linux operating system (Intel Pentium, i7 processor, 16 Gb RAM) (release 2020_4, Schrödinger, LLC, NY). For molecular docking analysis, we used Glide module incorporated in Schrödinger, LLC, NY package, 2020, V 8.9, (Maestro version 12.6). Studies pertaining to molecular dynamics, ADMET predictions, pharmacophore developments and binding energy calculations (MMGBSA) were carried out using desmond (V.6.4), Qikprop (V 6.6), Phase and Prime modules respectively (release 2020, Schrödinger, LLC, NY)21,24,25,26,29,36. Further, we have also executed popular non-commercial (free) online software tools like SwissADME, SwissSimilarity, SwissTarget prediction, lazar toxicity predictions and admetSAR for our current studies22,30,31,32. The higher resolution images pertaining to 3D-QSAR, docking, MMGBSA and CPH (common pharmacophore hypothesis) were generated through Schrödinger’s molecular modelling software and Gimp freeware. The detailed analysis of descriptors generation, pruning of descriptors, and QSAR validations using the QSARINS ver. 2.2.2 has been enclosed in the supporting information43,44,45.

Retrieval of the 3D crystal structure of the M. tuberculosis DprE1 complex

We have carefully selected and collected the necessary crystal structure of the M. tuberculosis DprE1 complex (pdb id:4P8C, Resolution: 1.95 Å) from the Protein Data Bank (http://www.rcsb.org). The selection of pdb id was performed based on our careful literature studies. This enzyme (monomer) essentially consisting of 2 chains, chain A (448 residues) and chain B (417 amino acid residues). Figure 2a, depicts necessary angles ψ against φ of amino acid residues characteristics to be visualized from Ramachandran plot for given protein structure 4P8C. The favoured region and generously allowed regions are generally depicted by red and yellow regions of Ramachandran plot respectively. The key residues involved in binding of native ligands were taken into considerations for molecular docking simulations, which generally gives idea for binding pockets of enzyme (Fig. 2b).

Figure 2
figure 2

(a) Ramchandran plot and (b) binding pocket for native ligands for protein 4P8C [(b) this figure has been visualized with ‘BIOVIA Discovery Studio Visualizer’ V.2022, available at: https://discover.3ds.com/discovery-studio-visualizer-download].

Protein preparation for selected target Mycobacterium tuberculosis DprE1

The required 3D crystal structure of M. tuberculosis DprE1 was imported into maestro workflow (http://www.rcsb.org). We processed the protein so that it was free from all the cofactors, inbound ligands and water molecules. Schrödinger’s protein preparation wizard was used for adding missing residues, adding hydrogens, generating Het states, and optimisation of the selected protein. After, processing of required protein, it was further processed for grid generation. Default parameter like centroid of inbound co-crystal ligand was used for making the grid. OPLS-2005 (optimized potentials for liquid simulations) force field was utilized throughout the entire protein preparations and simulations36,37.

Preparations of ligands for study

Firstly, we have sketched all the ligand structures selected from the literature source13 (tabulated in SI) in popular ‘ChemBiodraw’ (Version 12.0.2) in 2D forms and further saved in .sdf or .mol file format. These 2D structures then allowed importing into LigPrep module (Schrödinger, LLC, NY,2020) incorporated in maestro (V 12.6, 2020) for successful conversion into 3D forms. LigPrep utility was known to provide least energy conformers via ConfGen. After conversion into 3D forms, we allowed to run for geometrical optimization, add hydrogen, and generate a stereoisomer. The default option of generating 32 conformers for ligand was kept intact. The ‘Epik’ module (V 5.4, Schrödinger, LLC, NY 2020) was used for generating possible ionisation states for each ligand at pH = 7.0 ± 0.0. For our entire ligand preparation protocols, we adhered to OPLS-2005 force field28.

Common pharmacophore modelling (virtual screening approach)

We have applied “Phase” module (Schrodinger, 2020) for efficient developments of common pharmacophore hypothesis (CPH) (Table 1)21. There are number of literature reports for usage of Phase for CPH generations and 3D-QSAR models. With the help of flexible ligand alignments, firstly we aligned all 100 ‘Ligprep’ molecules. Use of the macromodel search method with maximum of 1000 per structure was applied in order to generate conformers for all the ligands. They were further minimized using the OPLS-2005 force field, as it has 100 steps of minimization. Finally, we have generated 14 CPH models, out of them we selected best top-ranked CPH (AAAHR_1) based on its scores and utilized for 3D-QSAR generations. We developed 5 point hypothesis as AAAHR_1. All actives were allowed to align on AAAHR_1 (Fig. 3). This CPH has 3 hydrogen bond acceptor (A), 1 hydrophobic group (H), and 1 aromatic ring (R) features as depicted by Phase utility. We have shortlisted ligands into 3 categories as active, inactive and intermediates based on biological data. The splitting pattern was considered by referring previous publications. Ligands with activity above 5.50 (pIC50 > 5.50) were considered as actives, while those having below 4.50 (pIC50 < 4.50) considered as inactive. While in between actives and in-actives there were intermediates (5.50 > Intermediates > 4.50). So, in our CPH developments there were 54 actives, 33 inactive and 13 intermediates (total of 100 ligands).

Table 1 Various pharmacophore hypotheses generated by PHASE.
Figure 3
figure 3

(a) A five point Pharmacophore model (AAAHR_1) generated by PHASE. The model illustrates acceptor feature (AAA; red coloured arrows), Hydrophobic feature (H:green) and aromatic ring (R: Brown color) features. (b) All active ligands overlapped on the generated model AAAHR_1 (this figure has been visualized from ‘Phase module’ Schrodinger, LLC, NY, V.2020; Available at: https://www.schrodinger.com/).

Scoring for best pharmacophore hypothesis

We have developed 14 pharmacophore hypotheses; which were further ranked according to their scores. The best hypothesis was selected on basis of survival score, site score, vector score, etc. (Table 1).

Atom-based and field-based 3D-QSAR modelling using the flexible ligand alignment

Phase module from maestro (V 12.6) interface of Schrodinger's utility was utilized for developments of 3D-QSAR models. For better understanding of correlation between structural features and biological activity, we tend to develop atom based as well as field based-3D-QSAR models (online Supplementary, Fig. S1). All the models were developed through random selection of training set and test set into 70%:30% by using default settings of ‘Phase’ module as per literature procedures defined earlier and adopted widely36,38,39,40,41. However, we ensured that developed models were not a chance of randomness and further assessed for their internal and external validations for statistical significance. Further splitting of datasets into training and test sets, were further checked for diverse chemical space adopted by molecules36,38,39,40,41. Active as well as inactive molecules were part of both sets to ensure good reliability of developed models. Further, for MLR based QSAR models, same strategy was used. In all cases, we have thoroughly checked our models for their robustness as per statistical formulas mentioned in the Supplementary information36,38,39,40,41. We randomly divided the dataset into 70% training set and 30% test set molecules with application of PLS factor of 4 for both 3D-QSAR models. The random selection made by software was further checked by visualizing the training and test set molecules (diversity among dataset molecules in training and test sets). We have taken care to maintain 1 Å grid spacing for the selected hypothesis. We have developed 4 atom-based 3D-QSAR models, while in case of filed based we developed 3 models (Tables 2, 3, 4, 5, 6, 7). In case of Atom-based models, we have incorporated 68 molecules in training set and 32 molecules in test set (Table 4). However, in case of field-based models there were 70 training set and 30 test set molecules. Statistical parameters were applied to select best models (Table 5) (online Supplementary Table S2). The Gaussian field-based 3D-QSAR models were consisting of Gaussian steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor. In field-based models, we considered Gaussian intensities as descriptors (as independent variables). Finally, best selected 3D-QSAR models were developed to visualize for the 3D contour maps associated with structural features (Figs. 4, 5). QSAR model visualization has its own importance for better optimization of the scaffolds.

Table 2 PLS parameters for developed Atom based 3D-QSAR models.
Table 3 The atom based 3D-QSAR statistics for atom type fraction.
Table 4 Experimental dataseta employed for Atom based 3D-QSAR study along with docking scores and predicted/actual pIC50 (µM) values (against) (PLS Factor-4).
Table 5 Experimental dataseta employed for Field based 3D-QSAR study along with predicted/ actual pIC50 (µM) values (against) (PLS factor-3).
Table 6 PLS parameters for developed field based 3D-QSAR models.
Table 7 The field based 3D-QSAR statistics for field fractions.
Figure 4
figure 4

(ae) Visual representation of atom-based PHASE 3D-QSAR model(compound 1)—(a) electron withdrawing, (b) hydrogen bond donor, (c) hydrophobic, (d) negative ionic and (e) others. Blue color cubes indicate positive coefficient or increase in activity and red colour cubes indicate negative coefficient or decrease in activity (this figure has been visualized from ‘Phase module’ Schrodinger, LLC, NY, V.2020, Available at: https://www.schrodinger.com/).

Figure 5
figure 5

Fields contour maps based on test set compounds. (a) Gaussian electrostatic fields: favored electropositive (blue) and disfavored electronegative (red). (b) Gaussian hydrogen bond acceptor field: favored (red) and disfavored (magenta). (c) Gaussian Hydrogen bond donor field: favored (purple) and disfavored (cyan). (d) Gaussian Steric field: favored (green) unfavorable (yellow). (e) Gaussian Hydrophobic field: favoured (yellow) and disfavored (white) (this figure has been visualized from ‘Phase module’ Schrodinger, LLC, NY, V.2020, Available at: https://www.schrodinger.com/].

Molecular docking simulations, target predictions, binding energy calculations and ligand based-virtual screenings

All the 100 ligands were allowed to dock into active pocket of selected protein 4P8C43. We have employed Glide V 8.9 for molecular docking simulations (PDB ID: 4P8C) to analyse compound binding for dataset as well as hit molecules (Table 4) (online Supplementary Fig. S8). As DprE1 has two chains, chain A (448 residues) and chain B (417 residues). We had taken the precaution before docking that corresponding protein and ligands were processed from Protein preparation wizard and LigPrep respectively. We noticed that co-crystal ligand from protein DprE1 shows binding residues with its inbound ligand as His132, Gly133, Lys134, Leu317, Val365, Lys367, Cys387 and Lys418. So, it is essential to dock dataset molecules as well as hits in order to get correlation between docking results and ligands to serve as inhibitors of enzyme (Table 4). So, we prepared the crystal structures with the co-crystallized ligandY22 for docking simulations. After completion of docking, the best docked dataset molecule 151(Fig. 6) was employed further for predictions of molecular targets other than one which docked by using the free utility “SwissTargetPrediction”. Further, same molecule was allowed for ligand based-virtual screening of hits using the “SwissSimilarity” online tool (online Supplementary Figs. S4, S6; Figs. 7, 8). By using this utility, we screened almost ASINEX hits (# = 69,3000 molecules) using combined method as reported by search algorithm. After, screening of hits, top 9 hits were further visualized for their pharmacokinetics characteristics, water solubility, toxicity studies, etc. MDS studies were also performed for 151 as well as best ASINEX hit molecule (Desmond 6.4, release 2020) (Figs. 9 and 10). We employed 3 docking modes HTVS (High throughput virtual screening), SP (standard precision) and XP mode (extra precision) for analysis of docking simulations of hits (Table 8). We have also calculated binding energy calculations using Prime MM/GBSA module from Schrodinger suite (Table 8).

Figure 6
figure 6

2D (a) and 3D (b) Ligand interaction diagram with the best docked molecule 151 (this figure has been visualized from ‘Glide module’ Schrodinger, LLC, NY, V.2020, Available at: https://www.schrodinger.com/).

Figure 7
figure 7

Structures of screened top 9 compound after virtual screening.

Figure 8
figure 8

most docked ASINEX hit (XP docking): (a) 2D- and (b) 3D-representation of binding mode for ZINC12196803 (this figure has been visualized from ‘Glide module’ Schrodinger, LLC, NY, V.2020, Available at: https://www.schrodinger.com/).

Figure 9
figure 9

(a) Showing the protein ligand(4p8c:151) interaction throughout the simulation time of 1.2 ns (green-H-bond, purple-hydrophobic contacts, Pink-ionic contacts and Blue-Water bridges). Values more than 1 suggesting more contacts and corresponding to 100%. (b) A timeline representation of the interactions and contacts (H-bonds, Hydrophobic, Ionic, Water bridges).

Figure 10
figure 10

(a) RMSD Plot Showing the protein ligand interaction throughout the simulation time of 1.2 ns. (b) RMSF plot for the target protein selected (for 4P8C:ZINC12196803) [Simulation Graphs using GROMACS, has been attached in the SI, for 100 ns results].

Table 8 Molecular docking and binding free energy analysis for studies ZINC hits.

Absorption, distribution, metabolism, excretion and toxicity predictions for dataset and newly identified ASINEX hits

We thoroughly carried out ADMET profiling studies using multiple commercial as well as free software tools including Schrödinger’s Qikprop, admetSAR, SwissADME and Pred-HERG (online Supplementary Tables S3S10). We have calculated in-silico Drug-likeness and medicinal Chemistry for top 9 ASINEX molecules by SwissADME. Additionally, hERG blockage studies were also carried using popular “Pred-HERG” for new 9 hits. Boiled-egg model study was also incorporated for new hits to study the intestinal absorption profile using swissADME tool (online Supplementary Figs. S2, S7). Analysis of pharmacokinetics with CYP’s enzymes was done in silico with SwissADME (online Supplementary Table S5). QikProp calculations (Table 9) gave us ideas on pharmacokinetic parameters like rule of five, % human oral absorption, O/W coefficient, brain blood barrier permeability and caco cell permeability. For in-silico analysis of carcinogenicity and AMES test toxicity profiling were performed with popular online free utility called “admetSAR” V. 2.0 (http://lmmd.ecust.edu.cn/admetsar2/) (SI Table-S11)32.

Table 9 ADME predictions for computed for ZINC hits by QikProp*.

Molecular dynamics simulation studies for best dock hits

We have carried out 1.2 ns and 5 ns molecular dynamics simulations (MDS) studies for our best dock hits (151:4P8C and ZINC12196803:4P8C complexes respectively) using Desmond 3.8 module (release 2017, Schrödinger, LLC, NY) incorporated into Schrodinger's interface in order to check the stability of the complexes (Figs. 9, 10) (online Supplementary Figs. S5, S9, S10). For MDS studies, we used the target protein 4P8C only by considering its importance with mycobacteria. We make use of OPLS-2005 MM force field for our initial steps during setup of MDS. For simulation of 151:4P8C complex, the counter ion/salt used was Na with concentration of 2.065 mM. Total of 59,725 atoms were simulated for 1.205 ns with NPT ensemble kept at temperature of 300.0 k (250 numbers of frames and 1.01325 bar pressure) using maestro version 11.1.011; 2017. For simulation of ZINC12196803:4P8C complex, we used same protocol as previous but total of 60,861 and 60,625 atoms were simulated for 1.205 (250 numbers of frames and 1.01325 bar pressure) and 5.008 ns respectively. The trajectory and number of frames kept at as obtained. The necessary MDS interactions were recorded from results obtained.

Results and discussions

Glide based molecular docking analysis

We had docked all the dataset as well as hits into binding cavity of protein 4P8C using Glide module V 8.9. We found that the dataset best docked molecule (Fig. 6) had docking score (XP mode) of − 9.068 kcal/mol; while that of standards isoniazid (INH = − 6.24 kcal/mol) and ethambutol (ETH = − 6.173 kcal/mol) had lesser docking scores (Table 4). We further used best docked dataset molecule 151 for ligand based virtual screening using “SwissSimilarity”. We also evaluated the docking score analysis for hits and found to be in range of − 5.61 to − 7.862 kcal/mol. The best docked hit ZINC12196803 (Fig. 8) had docking scores of − 9.437 kcal/mol (XP mode) and − 6.53 kcal/mol (HTVS mode) (Tables 8, 10). The molecule 151 had interactions with key amino acids in binding pockets of 4P8C. These were LYS418— –(N–C=O–NH) (hydrogen bond), GLY117—(–C=O–) (hydrogen bond) and non-polar interactions with THR118,PRO116,LEU115,PHE313 amino acid residues. The ASINEX best docked hit ZINC12196803 obtained with 2 side chains hydrogen bonding with TYR60, LYS418 and 1 backbone hydrogen bond with ARG58 amino acid (Fig. 8, Table 10). Other interactions includes hydrophobic interactions with ALA417, ARG58, ILE131, HIS132, etc. We had utilized same binding pocket as that of mentioned in materials and methods section. Figure 8 depicts 2D and 3D-interaction diagram for best dock molecules. We obtained RMSD values lesser than 2 Å indicating that docking was performed correctly.

Table 10 Comparative ligand–amino acid interactions carried with molecular docking for top 9 ASINEX HITS –for 4P8C(1–9), i.e. ZINC19500480,ZINC19500487,ZINC07426404,ZINC02252037,ZINC03157457,ZINC03153031, ZINC19500488,ZINC12196803, ZINC19447057 and standards (BTZ043, ETHAMBUTOL, INH) respectively.

MMGBSA (molecular mechanics generalized born surface area) binding energy calculations

We had carried out the evaluation of binding energies for all dataset as well as all hits with respect to standards such as isoniazid (INH), ethambutol (ETH). We found that best docked molecule 151:4P8C complex had MMGBSA dgBind energy of − 59.159 kcal/mol with respects to standards INH (− 30.214 kcal/mol) and ETH (− 39.290 kcal/mol). We also calculated binding free energy for ZINC12196803:4P8C complex and it was found to be − 70.508 kcal/mol. Our calculations suggested that hit molecule ZINC12196803 had better binding free energy and hence better stability rather than standards INH and ETH. Overall, results clarified the fact that our hits may have better stability (more negative binding energies) in terms of binding to protein 4P8C.

Evaluation of MDS (molecular dynamics simulations) studies

During molecular docking simulations, the flexibility of protein may not involve in general. Studies pertaining to MDS were initially, obtained through Desmond 6.4 module (Schrödinger, LLC, NY, 2020) and finally with Groningen Machine for Chemical Simulations (GROMACS v5.1.5) with GROMACS 96-53a6 force fields46,47,48. We have analysed the molecular dynamics for our best dock hits 151:4P8C and ZINC12196803:4P8C complexes throughout the period of 1.2 ns, 5 ns (1200 ps to 5000 ps) and upto 100 ns (Figs. 9, 10) to confirm the exact modes of binding and to check stability of them. We have simulated both complexes with water molecules. These systems were set at particular temperature and pressure conditions. We observed that the atom-positional root-mean-square deviation (RMSD) plots for both complexes demonstrated stability of complexes throughout the simulation timing of 1200 ps to 5000 ps (1.2–5 ns) and 100 ns. Both MDS results retained exact interaction patterns as obtained from the molecular docking simulations. We noticed that there were fewer fluctuations among the RMSF (root mean square fluctuation) plots of 151:4P8C and ZINC12196803:4P8C complexes throughout the period of 1.2–100 ns (Fig. 10, SI. Fig. S11). Molecular dynamics simulations are representing closer connections to the physiological environmental conditions; and thus, will guide for better understanding of binding patterns. Figures 9, 10, depicts timeline representation plots as obtained from Desmond. Supplementary Fig. S11 analyses the stability of ZINC12196803:4P8C complex over a period of 100 ns.

Evaluation of ADMET properties (absorption, distribution, metabolism, excretion and toxicity predictions)

We had calculated predictions of ADMET properties for our dataset as well as ASINEX hit molecules. For all docked dataset molecules, we obtained all QikProp (Schrödinger, LLC, NY, 2020) parameters within the standard range (Table 9). For our best docked hit molecule ZINC12196803, we obtained great QPPCaco cell permeability, 95.815% human oral absorption and 0 violations for Lipinski’s rule of five. The drug likeness and medicinal chemistry were also checked for both best docked hits i.e., 151 and ZINC12196803 using “SwissADME” online free software. Drug likeness studies showed us that both molecules had no violations for popular Lipinski’s rule, Ghose, Veber and Egan rules49. The bioactivity score for 151 was found to be 0.55, while synthetic accessibility as 2.93. SwissADME studies displayed that out dataset molecule 151 has solubility in water. As it falls in yellow region of boiled egg model, it may have probability to cross the blood–brain barrier (BBB). In order to explore other targets (for which these 151 molecules may act), we used “SwissSimilarity” tool. Our “SwissTargetPredition" analysis for 151 resulted into probable activities on 4 different classes (7% unclassified, 33% protease, 27% transporters and 33% on membrane receptors. Further, newly ligand based virtual screening hits were subjected for docking simulations. Among them, the best docked hit ZINC12196803 had same solubility (water soluble) and bioactivity score (0.55) as like 151. Our pharmacokinetic studies carried out using “SwissADME” suggested that all our newly searched hits had mixed profiling for BBB and p-gp substrate. All hits were found to be non-inhibitors for CYP1A2, CYP2D6 and CYP2C9 (except one). We have obtained mixed profiling for CYP2C19 inhibitions. Boiled egg model predictions were also depicted in mixed profiling of absorptions. We had also screened all hits for hERG blockade using online free tool called “pred-hERG” (online Supplementary Fig. S10). Our in-silico study for hERG blockade showed that ZINC12196803 molecule had no signs of cardio-toxicity as depicted in SI.

Finally, we have evaluated all new ASINEX hits for toxicity analysis using “admetSAR” (online Supplementary Table S11). With exception of 2 hits, all hits were found to be non-Ames toxic. All hits were also resulted in non-eye corrosion and non-eye irritation properties. We noticed that our predicted hit molecule ZINC12196803 may not have any kinds of toxicities as calculated from “admetSAR”. Acute oral toxicity for hit was also found to be 2.18 kg/mol.

CPH (common pharmacophore) analysis

We have developed 14 CPH models, among them CPH AAAHR_1 (Fig. 3a) imparted good scores for modelling parameters like survival score of 5.93, phase hypo score of 1.36 and site score of 0.895. As top ranked CPH AAAHR_1 satisfied the standard criteria, we further employed this for 3D-QSAR studies. Figure shows how we aligned all ligands to top ranked CPH AAAHR_1 (Phase, 2020, Schrödinger, LLC, NY). Other CPH models with detailed scores are showed in Table. Our top ranked CPH AAAHR_1 shows 3Hydrogen bond acceptor (A), 1 hydrophobic group (H), and 1 aromatic ring (R) features as generated by Phase utility (Table 1).

Evaluation of atom-based and field-based 3D-QSAR models with statistical parameters

For effective reliability in generated atom- as well as field-based 3D-QSAR models, we utilized both internal as well as external validation parameters. Leave-one-out (LOO) cross-validation method was implemented in order to access the robustness, stability and predictive attributes of developed 3D-QSAR models. We analysed both models by using 32 and 30 test compounds for atom based- and field based-3D-QSAR models respectively (Tables 2 and 6). The atom based QSAR models were developed with PLS factor of 4, while in case of field based it was 3. The internal validation parameters for both atoms based- and field based-3D-QSAR models are depicted in tables. Both models were also evaluated with external validation parameters and for both they were obtained within standard criteria. For both the models, software generated scatters are shown in (Fig. 11). Finally, we noticed that our developed 3D-QSAR models had very high statistical significance.

Figure 11
figure 11

Graphical presentation of Actual versus (X-axis) Predicted pIC50 (Y-axis) of (a) training set molecules and (b) test set molecules for (1) Atom based, (2) Field based-3D QSAR models (this figure has been visualized from ‘Phase module’ Schrodinger, LLC, NY, V.2020, Available at: https://www.schrodinger.com/).

3D-QSAR visualization

Visualization for atom-based 3D-QSAR

It can be seen that biological activities could be well correlated with structural patterns of the core moiety responsible for activity in the forms of occlusion maps. From the literature regarding atom based 3D-QSAR studies, blue occlusion maps/contours indicated incremental biological activity; while the red coloured occlusion maps/cubes corresponds to decrement (Schrodinger, LLC, NY, 2020). We have selected representative molecule 1 for aligning on top ranked 5 points CPH AAAHR_1 for effective visualization of QSAR occlusion maps. The occlusion maps surrounding 2,4-difluorobenzene motif showed blue occlusion maps indicating favoured region for electron withdrawing groups (Fig. 4a). While the entire remaining molecule represented mixed regions for electron withdrawing substitutions (EWS) (Fig. 4a). In case of occlusion maps corresponding to hydrogen bond donors (HBD); majority of red coloured maps (disfavoured regions) near to 2,4-difluorobenzene motif and imidazolidine-2,4-dione core were noticed (Fig. 4b). Figure 4c demonstrates majority of mixed regions for Hydrophobic substitutions (HS). We have observed red contours around methyl attached to imidazolidine-2,4-dione core indicating negative participations for negative ionic substitutions (Fig. 4d). The contours associated with other substitutions are depicted in Fig. 5. We have tabulated PLS parameters and atom type fractions in tables.

Analysis of field based-3D QSAR contour maps

For effective analysis of contour maps produced for 3D-QSAR, we selected molecule 1. We noticed that Gaussian Electrostatic field contour maps (Fig. 5a) surrounding (2,4-difluorobenzene –C=O–) attached to 2,4-dioxoimidazolidin core showed red contours indicating disfavoured electronegative regions. While, blue contour maps around 2,4-dioxoimidazolidin suggested favoured electropositive substitutions. Gaussian Hydrogen bond acceptor field contours (Fig. 5b) around 2,4-difluorobenzene showed disfavoured (magenta) regions. Red occlusion maps around (Ar–C=O-2,4-dioxoimidazolidin core depicted favoured Gaussian Hydrogen bond acceptor fields. Gaussian Hydrogen bond donor occlusion maps (Fig. 5c) near to 2,4-difluorobenzene demonstrated disfavoured (cyan) regions, while occlusion maps near to nitrile attached arene motif showed favoured purple coloured maps. Gaussian Steric field occlusion maps (Fig. 5d) at 2,4-difluorobenzene depicting both favored (green) unfavorable (yellow) regions for substitutions. White occlusion maps surrounding the entire (2,4-difluorobenzene –C=O–) attached to 2,4-dioxoimidazolidin core indicates disfavoured regions for Gaussian Hydrophobic fields (Fig. 5e). All software generated occlusion maps are shown in the figure. Tables, depicts field fractions and statistical parameters for generated field-based 3D-QSAR models.

Relevance of obtained validation parameters with reliability of 3D-QSAR models

The Modeling statistical parameters of the selected model have passed the acceptability criteria proposed by Golbraikh and Tropsha as depicted with Q2 > 0.5, R2train > 0.6, R2test > 0.6, r20 − r20 < 0.3, 0.85 < k < 1.15, 0.85 < k′ < 1.15, (r2 − r02)/r2 < 0.1 and (r2 − r′02)/r250. r2 is the fraction of the total variation in the dependent variables that is explained by the regression equation. Higher r2 closer to 0.9 but 0.6, is considered as good indication of QSAR model. In case of atom-based 3D-QSAR model we obtained lower RMSE 0.86. It has been widely accepted that a low RMSE value indicates that the simulated and observed data are close to each other showing a better accuracy. Thus lower the RMSE better is model performance. Further, SD is the standard deviation about the regression line. This is a measure of how well the function derived by the QSAR analysis predicts the observed biological activity. The smaller the value of SD the better is the QSAR. In cases of our developed 3D-QSAR models, the best models retained with lower values of SD (SD for atom-based 3D-QSAR: 0.419; SD for field-based 3D-QSAR: 0.6865). The statistical significance of the regression model can be assessed by means of the Fisher statistic (F). The F-value or variance ratio is the ratio between explained and unexplained variance for a given number of degrees of freedom, respectively p and (n = p = 1), where n are the chemicals and p the model descriptors. For our set of developed models in atom-based 3D-QSAR, the one we selected as best model retained with F value of 119.7. Similarly, one can analyse the relevance of such statistical significances using the description attached in the supporting information.

Evaluation of QSARINS based MLR models

The top most model with higher statistical significance, as demonstrated by internal and external validations calculations, was further analysed for its interpretations. The developed model-1 is represented by the below MLR equation:

Multivariate models

Model 1 (70% training: 30% test set, 6 parametric)

$$\begin{aligned} pIC_{50} &= - 10.5590\left( { \pm 7.1202} \right) - 0.0005\left( { \pm 0.3267} \right)*ATSC7m \hfill \\ & \quad - 0.1252\left( { \pm 0.2216} \right)*AATSC4v - 4.4124\left( { \pm 0.2212} \right)*GATS4i \hfill \\ & \quad + 5.7006\left( { \pm 0.4025} \right)*SpMax5\_Bhe - 0.1334\left( { \pm 0.4632} \right)*SHBint6 \hfill \\ & \quad + 0.2850\left( { \pm 0.5919} \right)*minHBint5 \hfill \\ \end{aligned}$$
(1)

QSAR model interpretation

In currently developed 6 parametric model-1 (1) from QSARINS, the descriptor ATSC7m represents Centred Broto-Moreau autocorrelation of lag 7 weighted by mass. This descriptor has negative correlations with the biological activity (BA) as denoted by examples such as [141 (pIC50 = 4.1), 142 (pIC50 = 6.0)] and [183 (pIC50 = 7.2), 185 (pIC50 = 6.2)]. The descriptor, AATSC4v denotes Average centered Broto-Moreau autocorrelation—lag 4/weighted by van der Waals volumes and found to have negative correlations with BA, which can be seen with examples such as [112 (pIC50 = 6.9), 118 (pIC50 = 6.6)], [122 (pIC50 = 4), 123 (pIC50 = 4.5)], etc. Descriptors, Geary autocorrelation of lag 4 weighted by ionization potential (GATS4i) and Sum of E-State descriptors of strength for potential hydrogen bonds of path length 6 (SHBint6) were represented decreasing trends with the decrease in the values of descriptors. Further, we noticed that descriptors SpMax5_Bhe (Largest absolute eigenvalue of Burden modified matrix—n 5/weighted by relative Sanderson electronegativities) and minHBint5 were positively correlated with the BA. The model also contained 2D atom type electro-topological state descriptor minHBint5 which is defined as Minimum E-State descriptors of strength for potential Hydrogen Bonds of path length 5 (minHBint5). These descriptors indicated the importance of hydrogen bonds of path length 5. Examples of compounds following this trend includes, [114 (pIC50 = 7.4), 115 (pIC50 = 6.5)], [116 (pIC50 = 7.3), 122 (pIC50 = 4)], etc.

Full details of statistical analysis, graphs of experimental vs predicted pIC50 values for model 1 (b) William’s plot for model 1; (c) Insubria plot for model 1 (d) Y-scrambling plot for model 1 have been enclosed in the supporting information (please refer online Supplementary Fig. S13). Furthermore, values for the various cross-validation properties supported statistical robustness of GA-MLR QSAR model with (R2cv, RMSEcv, MAEcv, CCCcv, and Q2LMO). Higher values for R2ex, Q2F1, Q2F2, Q2F3, Golbraikh and Tropsha criteria and CCCex depicted the external predictive power of the developed models 1 (please refer online Supplementary, Tables S1214)51.

Therefore, creating QSAR models with various molecular descriptors and broad chemical spaces will undoubtedly offer insightful information on the causes of variations in the anti-DPRE1 activity of hydantoin-based inhibitors. We acknowledge the limits of the QSAR models that have been established so far, but better models would result from having more descriptor calculation data, accurate modelling, and fewer statistical interferences36. Consequently, each model generated here demonstrates the integration of all chosen chemical descriptors and thus forecasts potential pIC50 values for the aforementioned analogues.

Conclusion

To conclusion, we have developed statistically robust QSARINS based GA-MLR, atom-based as well as field based-3D-QSAR models with robust training set, R2 > 0.69 and test set, Q2 > 0.50 parameters. Our generated top ranked 5 point hypothesis AAAHR_1, which had 3 hydrogen bond acceptors (A), 1 hydrophobic group (H), and 1 aromatic ring (R) features depicted best survival score among 14 developed hypotheses as generated by Phase utility. Our dataset best docked molecule, 151 had goo docking scores (XP mode) of -9.068 kcal/mol; while that of standards isoniazid (INH = − 6.24 kcal/mol) and ethambutol (ETH = − 6.173 kcal/mol) had lesser docking scores than it. The molecule 151 had interactions with key amino acids in binding pockets of enzyme DPRE1 (4P8C). These were found to be LYS418— –(N–C=O–NH) (hydrogen bond), GLY117—(–C=O–)(hydrogen bond) and non-polar interactions with THR118,PRO116,LEU115,PHE313 amino acid residues. Finally, our best docked 151was queried for identifications of new hits using “SiwssSimilarity”. Among the top 9 ASINEX hits, the hit molecule ZINC12196803 had best binding energies and docking score (docking score = − 9.437 kcal/mol, MMGBSA dgBind = − 70.508 kcal/mol). We observed that RMSD plots for 151:4P8C and ZINC12196803:4P8C complexes demonstrated good stabilities throughout the simulation timings of 1200 ps to 10,000 ps (1.2–100 ns). Our pharmacokinetic studies carried out using SwissADME, QikProp, pred-hERG and admetSAR demonstrated that all our newly searched hits had mixed profiling for BBB and p-gp substrate. All hits were found to be non-inhibitors for CYP1A2, CYP2D6 and CYP2C9 (except one). Finally, we can say that our different combinations of computational techniques for identifications of new DPRE1 inhibitors may pave new way towards future developments.