Introduction

Gardnerella vaginalis (G. vaginalis) was first isolated and described by Leopold in 1953 as a small, gram-negative, non-motile, non-encapsulated, rod-shaped bacterium identified from cervical swabs of women with cervicitis and men with prostatitis1. Through his works, Leopold did not name its species but suggested it as a member of the Haemophilus genus. Later in 1955, Gardner and Dukes isolated the organism in women with non-specific bacterial vaginitis2. The organism was the first recognized Gardnerella specie and its average dimensions were 0.4 × 1.0 to 1.5 µm. It is a facultative anaerobic fastidious microorganism and can grow at 37 °C in complex media as well as in an atmosphere with 5–10% carbon dioxide3. Its surface is covered with fimbriae responsible for attachment to vaginal epithelial cells. It can be sexually transmitted and is capable of forming biofilms that evade the host defense mechanism4. Accordingly, identifying and treating G. vaginalis is important. The gold standard for identifying the bacterium is through the microscopic criteria of the Nugent scoring system as it is reproducible and highly sensitive5. Nugent’s criteria entails assessing the normal flora in the Gram-stained smear of vaginal discharge, but Amsel’s method is also critical for clinical diagnosis, with a few simple laboratory tests, and is usually preferred in clinical setup5,6.

G. vaginalis is categorized into specific biotypes based on biochemical characteristics such as Hippurate hydrolysis and the activity of lipase and beta-galactosidase, as indicated by Piot et al., with the establishment of eight separate biotypes7. Some subtypes are most prevalent such as isolates from vaginal discharge of women with bacterial vaginosis (BV) while others in asymptomatic women have not been tested due to the absence of symptoms8. G. vaginalis virulence factors include the secreted toxin vaginolysin (VLY) and inerolysin (INY), with structural and activity features attributed to cholesterol-dependent cytolysins (CDCs)9. Pathogenic vaginal bacteria produce CDCs as virulence factors that form lytic and non-lytic pores in cell membranes as part of disease progression through pore-independent pathways10. High levels of membrane cholesterol that act as an INY receptor cause VLY-induced damage to artificial membranes. The sialidase enzyme (sld) also participates in G. vaginalis pathogenesis mechanism as its enzymatic activity reduces the protective vaginal mucosal layer and facilitates bacterial adhesion to the vaginal epithelium and resultant development of the microbial biofilm development to increase its infectiousness11.

Current treatment for BV is unsatisfactory as antibiotics only offer short-term cure but fail to provide consistent long-term remedy12. Despite the high recurrence rate, antibiotics are recommended as the first-line treatment. However, promising results have been reported in clinical trials for the use of live biopharmaceuticals/probiotics and vaginal microbiome transplantation for BV treatment13. For treating the dysbiosis, antibiotic therapy via intra-vaginal gel or oral pill of metronidazole or clindamycin is common, and Lactobacilli alternative or conjugate treatments with antibiotics are also perceived to increase clinical cure rates14. The second round of antibiotics is administered when there is a recurrence. As part of managing the adverse outcomes of BV, screening for G. vaginalis in asymptomatic individuals, especially pregnant women, is also considered effective and beneficial as part of the treatment process15. The screening aspect among the normal population is also important in decreasing adverse consequences. The limitations of current therapies for treating BV are linked to an incomplete understanding of the dysbiosis pathophysiology as this impedes the development of optimal treatment and prevention approaches16. Thus, the challenges in preventing the frequent and relentless symptomatic recurrences of BV and reducing serious sequelae call for more research to achieve a sustainable cure.

In an effort to address the limitations of current therapies for bacterial vaginosis (BV), we turned to a bioinformatics approach to prioritize drug targets within G. vaginalis and screen potential drugs against it. This approach leveraged computational techniques like subtractive genomics, molecular docking, Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) and molecular dynamics (MD) simulation for drug target and inhibitor screening17,18,19,20. It has been previously implemented for various bacteria18,19,20 and allows to selectively target bacterial proteins that are absent or significantly different in the host, minimizing potential off-target effects and host toxicity21. Docking simulations help predict and validate the binding affinity and interactions between compounds of interest and the selected drug target, respectively22. ADMET determines pharmacokinetics of the compounds having potential to disrupt essential bacterial functions or pathways, leading to bactericidal or bacteriostatic effects23. Hence, the subtractive proteomics aided drug design approach helps ensure that potential therapeutics not only target bacterial functions effectively but also minimize the risk of adverse effects in human hosts, ultimately leading to safer and more efficacious treatments.

Material and methods

Proteome of the reference G. vaginalis strain ASM286196v1/UMB0386 (Genome size: 1.7 MB; Genbank accession: GCA_002861965.1; 99.49% completeness) was downloaded in .faa format. The total number of genes were 1417, while protein-coding genes were 1321. Non-redundant proteins were 1248.

Drug candidate identification

Subtractive proteomics was employed to identify potential drug targets. Initially, the non-redundant protein set was utilized to explore essential protein targets. This involved the utilization of standalone BLAST software24 to conduct a homology analysis using essential protein databases, the Cluster of Essential Genes (CEG)25 and Database of Essential Genes (DEG)26. Proteins homologous and common to both CEG and DEG were retained. These were subsequently compared with the human proteome and non-homologous proteins were retained. The retained set underwent a homology search against beneficial gut flora (n = 84 spp. genomes) as described previously27 and only proteins demonstrating non-homology were retained. This set was further subjected to homology analysis against the DrugBank database28, and proteins exhibiting homology were designated as potential drug targets. Among them, one target 3-deoxy-7-phosphoheptulonate synthase (aroG gene product), also known as DAHP synthase was chosen for downstream analysis.

Sequence characterization

For a protein to be a potential drug target, it should be druggable i.e. possessing folds that favor interactions with small drug-like molecules and containing a binding site29. Important characteristics of druggable proteins include hydrophobicity, in vivo half-life, propensity for being membrane-bound, and fraction of non-polar amino acids30. To determine these properties, physicochemical parameters were studied using the ProtParam tool (https://web.expasy.org/protparam/; accessed 9 July 2024)31. DeepTMHMM32 was used to determine the propensity for being membrane-bound. Antigenicity is not necessarily a requirement for a protein to be a drug target when small molecules are being screened. However, if peptide or antibody-based docking is assessed, the immunogenic properties are important. We determined antigenicity using VaxiJen v2.0 (https://www.ddg-pharmfac.net/vaxijen/; accessed 9 July 2024), using a default cut-off criteria of 0.5 for antigenic proteins. BTXPred Server (https://webs.iiitd.edu.in/cgibin/btxpred/btx_main.pl; accessed 9 July 2024)33, was used for toxicity and AlgPred 2.0 (https://webs.iiitd.edu.in/raghava/algpred2/batch.html; accessed 9 July 2024)34 for allergenicity determination, using default parameters.

Structure modeling

A tertiary structure prediction of the DAHP synthase was done using two tools, the SWISS-MODEL (https://swissmodel.expasy.org)35, and AlphaFold 2.036. The SWISS–MODEL is an online tool and a widely used platform for protein structure prediction37. In contrast, AlphaFold is a deep-learning based model developed by DeepMind, a subsidiary of Alphabet Inc38. AlphaFold 2.0 was accessed via the Colabfold tool on Google Colab39. We compared the results from both predictions to determine the most accurate and reliable structure. The final structure was further refined by using GalaxyRefine40 and structural motifs were analyzed by Pro-motif via PDB-sum tool41.

Docking and ADMET

For the docking of DAHP synthase, we used PyRx42. PyRx is built on open-source tools and libraries, including AutoDock Vina, Open Babel, NumPy, and wxPython. A library of 112 inhibitors specific to our target were retrieved from the ZINC database (https://zinc.docking.org/genes/AROG/predictions/; retrieved 28 April 2024) for docking. Following the import of the protein file, it underwent conversion to the .pdbqt format, which is instrumental in providing critical information about atom types and partial charges. Next, employing OpenBabel, the ligands were imported. To ensure the optimal suitability of these ligands, they were subjected to energy minimization43. Post-minimization, the ligands were converted to .pdbqt format. Grid box parameters were: center_x = 0.191, center_y = 0.2533, center_z = −0.2372, size_x = 50.5656, size_y = 48.2306, size_z = 74.7034 and exhaustiveness was kept at eight. This center served as the focal point for docking calculations, covering the chosen pocket. Among the docked 112 ligands, the five with the lowest binding affinities were saved for further processing. Phosphoenolpyruvate was taken as control, due to its essential role in binding aroG gene product. Interactions were visualized in LigPlot+44.

To evaluate drug-like properties of the top five ligands and control, ADMET was conducted using SWISS-ADME45 and pkCSM server46. Molecular weight, lipophilicity (LogP), hydrogen bond donors and acceptors, and topological polar surface area (TPSA) was predicted using the SWISS-ADME. Synthetic accessibility was also determined through the occurrence and frequency of fragmental substructures47. Scores ranged from 1 (very easy synthesis) to 10 (very difficult synthesis). AMES mutagenicity, safe dose values, solubility, human intestinal absorption, blood brain barrier permeability, skin sensitization etc., were predicted by pkCSM.

Physiological based pharmacokinetic behavior of the compounds was modeled using GastroPlus software v9.3, as described previously23. Animal taken was human, with prandial state = fasting, modeling is a powerful tool used in pharmaceutical research to predict the behavior of drugs in the body based on physiological parameters. The model was built with an intended route of administration as oral and dosing regimen of 100 mg (tablet). The effects of the drug were checked in a population of 50 people for various health conditions i.e. healthy, cirrhosis, steatosis (non-alcoholic), renal impairment (moderate), and 50 pregnant females. In healthy and diseased cohorts, age was kept between 20 and 80, while it was kept between 20 and 40 for pregnant females. The population comprised 25 males and 25 females in healthy and diseased cohorts, while the count was 50 for females in the pregnant (split between 25 fetuses as males and 25 fetuses as females) state. Mean values of the fraction absorbed from the gastrointestinal tract (Fa%), fraction of the drug that reaches the systemic circulation after passing through the liver (FDp%), total bioavailability (F%), maximum concentration of the drug in the bloodstream (Cmax μg/ml), time it takes to reach Cmax (Tmax in h), and area under the curve (AUC) of the drug concentration–time profile were noted. AUC(0-inf) considered the entire time course, while AUC(0–t) considered 10 h time point.

Molecular dynamics (MD) simulation

To investigate the dynamic behavior and stability of the five best-scored ligand–protein complexes, MD simulation was conducted using GROMACS48 v2021.4. The AMBER 99 SB force field was chosen for this purpose, providing a reliable representation of the interactions between atoms within the complex49. Initially, an NVT (constant Number of particles, Volume, and Temperature) step was conducted for 1 ns (ns), allowing the system to equilibrate at a temperature of approximately 300 Kelvin (K)50. This step was crucial for the stability of the complex, simulating physiological conditions. Subsequently, the MD simulation was run for a total of 100 ns, saving the trajectory every 50 ns to yield 2000 frames in total. This extensive sampling of frames provided a thorough understanding of the dynamic behavior of the protein–ligand complex over the entire simulation period. After the MD simulation, the collected trajectory was utilized to analyze several key properties of the complex. The Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) plots provided insights into the stability and conformational changes of the complex. Additionally, the Principal Component Analysis (PCA) to assess motion across trajectory and Radius of Gyration (Rg) to assess the overall compactness of the complex were computed. This analysis aided in identifying important motions and conformational changes that are relevant to the function and stability of the complex. Finally, the MMGBA and MMPBSA methods51 were employed to predict the binding free energies of the complex, providing a quantitative measure of the ligand–protein interaction strength.

Results

In this study, we conducted a comprehensive analysis of the proteome of the reference strain of G. vaginalis using a subtractive genomic approach. The process and subsequent analysis, including docking and MD simulation is presented below.

Drug target selection

In the process of drug target selection, a total of 583 proteins from the non-redundant proteome of G. vaginalis were matched with DEG and 550 proteins from CEG. Among these, 531 proteins were found to be common to both databases. Subsequent analysis focused on this common dataset and 30 proteins were identified as non-homologous to humans. Following further scrutiny against the proteome of beneficial gut flora, this set was narrowed down to 11 proteins. These 11 proteins (Table 1) were subsequently used for selecting the final drug target (DAHP synthase) for virtual screening, marking a crucial step forward in the drug discovery process. The protein DAHP synthase plays an essential role in the production of amino acids (phenylalanine, tyrosine, and tryptophan) required for protein synthesis and other cellular functions52,53. DAHP synthase also participates in secondary metabolite such as antibiotics, pigments, and toxin biosynthesis54,55, suggesting a potential role of G. vaginalis defense mechanisms or interactions with its environment (cervical mucosa). G. vaginalis is also identified to be involved in the quorum sensing or cell-to-cell communication system in bacteria, which is necessary for creating a biofilm by forming a community and successfully competing with lactobacilli for dominance in the vaginal environment56. Hence, it is good to target DAHP synthase involved in the production of signaling molecules to regulate gene expression based on G. vaginalis population density. G. vaginalis is known to amend cervicovaginal epithelium and associated cellular function, through microbe-specific immune responses, potentially leading to adverse reproductive outcomes57. Hence, targeting DAHP synthase in G. vaginalis would hamper this type of interaction as well.

Table 1 Details of drug targets identified from the proteome of G. vaginalis.

DAHP synthase sequence properties

The molecular weight of the DAHP synthase was determined as ~ 38 kDa, with a theoretical pI of 5.70. The total number of negatively charged residues (Asp + Glu) was 44 and positively charged residues (Arg + Lys) was 36. The fraction of non-polar amino acids was 43.5% while the estimated half-life was 30 h in mammalian reticulocytes (in vitro), > 20 h in yeast (in vivo), and > 10 h in Escherichia coli (in vivo). The instability index (II) was computed to be 32.70, classifying the protein as stable. The aliphatic index was 95.65 and the Grand average of hydropathicity (GRAVY) was -0.126. The topology was predicted as globular, indicating that the protein likely folds into a compact, roughly spherical shape. Globular proteins typically have their hydrophilic amino acids on the outside, interacting with water, and their hydrophobic amino acids tucked away on the inside of the folded structure.

No hits were found for DAHP synthase in BTXpred Server, specialized for toxin prediction of bacterial proteins, or AlgPred, specialized for allergenicity prediction. Hence, the protein was determined as non-toxic, and non-allergenic. It was also found to be non-antigenic (VaxiJen score: 4.22).

Structure modeling

The SWISS-MODEL used a phospho-2-dehydro-3-deoxyheptonate aldolase (Uniprot accession: A0A3A1XLV4) as a template to create a 3D structure of DAHP synthase protein (Fig. 1A). The template belonged to Bifidobacteriaceae bacterium NR021. The similarity between the amino acid sequences of DAHP synthase and the reference sequence was 97.18%. Global Model Quality Estimate (GMQE), reflects the overall quality of the model based on various factors and a score close to 1 (in this case 0.96) indicates good model quality. The MolProbity score for evaluating the geometric and chemical plausibility was also assessed and found to be 0.88. A score above 0.8 is generally considered good. Ramachandran analysis (Fig. 1B) shows that 96.88% of residues fell within favored regions, with no outliers detected. Additionally, one C-beta deviation was noted in residue A145 (SER). In terms of bad bonds, none were observed out of 2711 analyzed bonds.

Figure 1
figure 1

(A) 3D structure prediction of DAHP synthase from SWISS-MODEL. (B) Ramachandran plot for the SWISS-MODEL predicted structure. (C) 3D structure of DAHP synthase from AlphaFold. (D) Ramachandran plot of AlphaFold predicted structure.

For the AlphaFold predicted structure (Fig. 1C), the MolProbity score was 0.94, slightly better than the SWISS-MODEL generated structure. Ramachandran analysis (Fig. 1D) revealed that 96.94% of residues were within favored regions, with only 0.26% outliers observed. There were no C-beta deviations detected. Moreover, there were no bad bonds out of 3030 analyzed bonds. This structure was further refined using GalaxyRefine (to Ramachandran favored residues > 97%) and selected for subsequent analysis as the quality was better than that of the SWISS-MODEL predicted structure. Pro-motif showed the presence of two sheets, six beta alpha beta units, one beta-hairpin, two beta bulges, 10 strands,14 helices, 15 helix-helix interacs, 26 beta turns, and one gamma turn in this structure. Overall, 137 residues (38.7%) formed alpha-helices, 53 residues (15.0%) formed strands, and nine residues (2.5%) formed 3–10 helices. The parallel beta-sheet had -1X -1X -1X -1X -1X -1X -1X topology (Y-barrel) and comprised eight strands, while the anti-parallel sheet formed N-barrel and had two strands.

Docking

The phosphoenolpyruvate was used as a control (Supplementary Fig. 1), and showed docking (Supplementary Fig. 2) with an affinity value of −5. In total, eight residues interacted with phosphoenolpyruvate, including one basic (Arg76) and two acidic (Glu34, Glu38). Phosphoenolpyruvate is known to bind DAHP synthase and bacteria commonly use it in nutrient signaling as it can donate a phosphate group58. Its interaction with the carbohydrate phosphotransferase system represents the mechanism through which many bacteria import sugars and their derivatives59. The compounds within the aroG inhibitor library exhibited enhanced affinity towards binding DAHP synthase compared to the control group (Table 2, Fig. 2). Notably, ZINC72988191 demonstrated the highest level of interaction, with a total of 15 interactions, inclusive of several covalent bonds. Among these interactions, two were attributed to basic residues (Arg88, Lys182) and one to an acidic residue (Glu139). ZINC5113880 exhibited 12 and ZINC73198807 exhibited 16 interactions, with both of them forming three hydrogen bonds each. In both cases, the presence of basic residues (Arg88, Lys182) and acidic residues (Glu139) was also consistent. ZINC64746551 displayed a total of 11 interactions, involving two basic residues (Arg88, Lys182) and one acidic residue (Glu139). Lastly, ZINC98088375 demonstrated 15 interactions, with two basic residues (Arg88, Lys182) and one acidic residue (Glu139). The variations in the number and type of interactions between different compounds and the target enzyme highlight the structural diversity and specificity of the inhibitor molecules.

Table 2 Binding affinities, molecular formula, weight, and SMILES of five top-scoring ligands.
Figure 2
figure 2

2D visual of (A) Control and DAHP synthase interaction, (B) ZINC72988191 and DAHP synthase interaction, (C) ZINC5113880 and DAHP synthase interaction, (D) ZINC64746551 and DAHP synthase interaction, (E) ZINC73198807 and DAHP synthase interaction, (F) ZINC98088375 and DAHP synthase interaction. Ligand is depicted by Unk1(B) in each plot. Residues in red and atoms in black show hydrophobic contact.

ADMET profiling

ZINC64746551 had the highest total polar surface area (Table 3), followed by ZINC72988191 and ZINC5113880. ZINC98088375 had the lowest surface area. All of the compounds were soluble and had zero Lipinski violations, indicating that they comply with Lipinski's Rule of Five, which predicts drug-likeness based on molecular properties like molecular weight, lipophilicity, and hydrogen bonding capacity60. ZINC98088375 had the highest bioavailability score (0.85), suggesting it may have better oral bioavailability compared to the other compounds. The other compounds have relatively lower bioavailability scores ranging from 0.11 to 0.56. Gastrointestinal absorption was high for control and ZINC98088375 and low for others. All the molecules were indicated as non-substrates for P-glycoprotein (Pgp) and not predicted to inhibit the cytochrome P450 (CYP) enzymes, including CYP1A2, CYP2C19, CYP2C9, CYP2D6, and CYP3A4. The boiled egg plot (Supplementary Fig. 3) showed no molecule to be effluated by the central nervous system by P-glycoprotein. Phosphoenolpyruvate has the lowest log Kp value (-11.33 cm/s). Only ZINC98088375 depicted blood–brain barrier permeability and had the highest log Kp value (-6.42 cm/s), suggesting relatively higher membrane permeability compared to the others. None of the compounds showed AMES toxicity or hERG-I/II inhibition (Supplementary Table 1), hepatotoxicity, or skin sensitization. Each compound had a different maximum tolerated dosage value, with ZINC73198807 exhibiting the highest value. It also exhibited the highest LD50 value for acute oral rat toxicity, depicting the lowest overall toxicity among the prioritized compounds. ZINC72988191 has the highest toxicity toward T. Pyriformis and minnow.

Table 3 ADME parameters of the studied compounds.

The PBPK modeling results suggest that Phosphoenolpyruvate absorption might be slightly affected by health conditions, with a minor decrease in steatosis and cirrhosis (Table 4). Cirrhosis appears to have a higher liver metabolism rate, leading to lower bioavailability compared to other groups. However, the overall drug exposure seems to be relatively consistent across different health states. ZINC98088375 had the highest absorption percentage (94.05%), indicating the highest fraction of the administered dose reaching the systemic circulation. ZINC72988191, ZINC64746551, and ZINC73198807 had moderate absorption percentages ranging from 66.994 to 76.803%, while ZINC5113880 exhibited the lowest absorption (39.661%). ZINC98088375 also reached its maximum concentration relatively quickly (3.818 h), compared to other compounds having longer Tmax values ranging from 5.262 to 8.614 h.

Table 4 Mean values of PBPK simulated endpoints for prioritized compounds and control for 10 h.

Absorption percentage decreased in steatosis in ZINC73198807 and ZINC5113880, while it increased for other compounds. In cirrhosis, absorption remained almost the same for ZINC73198807 compared to healthy subjects, while decreased for ZINC98088375. For the rest of the compounds, it increased. Steatosis, cirrhosis, and renal impairment showed a varied Cmax and Tmax compared to the healthy condition, suggesting slower or faster absorption. In pregnant women, the impact on absorption seemed to vary depending on the compound. Hence, we can infer that gastrointestinal and kidney function can be impaired in some of the health conditions, potentially affecting drug absorption and elimination.

MD simulation

ZINC64746551 exhibited the lowest overall RMSD throughout the simulation, measuring less than 2 Å at the concluding 100 ns mark (Fig. 3A). Similarly, ZINC5113880 displayed a comparable trend, with RMSD values hovering around the 2 Å threshold, albeit with slight fluctuations observed between 20 and 40 ns. The control Phosphoenolpyruvate also demonstrated fluctuations in RMSD but sustained at less than 2.5 Å at 100 ns closing interval. Conversely, ZINC72988191 exhibited fluctuations in RMSD, ultimately closing at approximately 3 Å. ZINC98088375 displayed a consistent increase in RMSD after the initial 20 ns, eventually closing at around 3.3 Å, whereas ZINC73198807 showcased the highest closing RMSD, peaking at approximately 4 Å by the end of the 100 ns simulation period. The distance of selected residues Lys78, Leu84, and Ile85 to the ligand was also studied throughout simulation. It was average 18.4 Ǻ for Phosphoenolpyruvate, ~ 18 Ǻ for ZINC72988191, 19.1 Ǻ for ZINC5113880, 18Ǻ for ZINC64746551, 19.3 Ǻ for ZINC73198807 and 9.9Ǻ for ZINC98088375 (Supplementary Fig. 4). RMSF was between 1.8 and 2.7 Ǻ for all compounds around residue Thr97 (loop region) and between 1.4 and 7.6 Ǻ at Gly212 (loop region) (Fig. 3B). Rg remained between 19 and 20 Ǻ for all complexes (Supplementary Fig. 5).

Figure 3
figure 3

(A) RMSD plot of studied compounds over 100 ns. (B) RMSF plot of studied compounds over 100 ns.

The MM/GBSA values for the complexes (Table 5) are indicative of favorable binding interactions between the ligands and DAHP synthase. DAHP synthase-ZINC73198807 depicted the lowest overall total energy while the lowest ΔG was observed for DAHP synthase-ZINC5113880. This implies that ZINC73198807 formed a seemingly more stable complex while ZINC5113880 might have more favorable binding interactions leading to a lower ΔG. This is corroborated by RMSD, where ZINC5113880 depicted less overall fluctuation or conformational changes.

Table 5 MM/GBSA energy values of the complexes.

Discussion

G. vaginalis is a predominant anaerobic bacterium that can lead to BV when it proliferates excessively61. BV is the primary cause of abnormal vaginal discharge or vaginitis. Intrauterine G. vaginalis infection can have adverse effects on pregnancy outcomes, including preterm birth, fetal growth restriction, and neonatal pneumonia62,63. Timely diagnosis and appropriate management are necessary to prevent adverse outcomes. Several drugs, such as metronidazole (Flagyl), clindamycin (Cleocin), and metronidazole vaginal gel (MetroGel-Vaginal), are commonly used against G. vaginalis due to their effectiveness in treating infections64,65. Secnidazole (Solosec) has also shown efficacy as a single-dose oral treatment for bacterial vaginosis66. However, penicillin, ampicillin, tetracycline, and gentamycin have demonstrated resistance to G. vaginalis isolates67,68,69. This highlights the need to explore and identify new drugs for effective treatment. Here, we used an in silico approach to identify drug targets and screen inhibitors against the selected target in this bacterium. Out of 11 targets, aroG (DAHP synthase) was finalized because it plays a critical role in the shikimate pathway70. This pathway is essential for the production of several important molecules, including aromatic amino acids (phenylalanine, tyrosine, tryptophan) as well as many secondary metabolites derived from these amino acids (e.g., phenylpropanoids, glucosinolates, hormones)71. Disruption of the shikimate pathway limits the production of these essential molecules. Since, DAHP synthase catalyzes the first reaction in the shikimate pathway, inhibiting this enzyme can have a significant impact on the entire pathway output due to its control on the amount of carbon entering the pathway72,73. The DAHP synthase enzyme is a homodimer, with each subunit folding into an (α/β) 8 barrel structure74,75. Our predicted structure had similar folds, comprising a barrel topology. DAHP synthase possesses characteristics that make it a potentially viable drug target, including the protein size, stability, and potential globular structure with segregated hydrophobic regions. It also depicted binding residues and was non-toxic, non-allergenic and non-antigenic. Antigenicity can be useful for some antibody-based therapies but not an essential characteristic for a protein to be a viable drug target for small molecule screening, as done in this study.

We tested the binding of phosphoenolpyruvate as a control with the DAHP synthase and it showed eight interactions of various types, showing good binding. Balachandran et al.76have previously shown that the DAHP oxime, a phosphate group mimic also binds with DAHP synthase but even in competitive binding with DAHP synthase, it does not attach with the metal binding site. Oliveira et al.77 have previously investigated the docking interactions between 28 phosphoenolpyruvate derivatives and DAHP synthase from Corynebacterium glutamicum. Their study consistently identified specific binding modes for multiple ligands, suggesting that these compounds possess noteworthy pharmacophoric properties associated with multi-targeting. However, we obtained some compounds from the aroG inhibitor library with even better affinities. These included ZINC72988191, ZINC5113880, ZINC64746551, ZINC73198807 and ZINC98088375. All these were predicted to have high solubility, which is a desirable property for drugs but not all had good gastrointestinal absorption or blood–brain barrier crossing ability. All molecules had negative log P values, ranging from −6.42 to −11.33. According to Lipinski's rule of five, a rule used in drug discovery to evaluate a drug's potential for oral bioavailability, a good log P value typically falls between −5 and + 5. These molecules fall within or below this range, suggesting good potential for oral absorption. Additionally, blood–brain barrier permeability also does not matter as we need not target the central nervous system. None of the molecules had violations of Lipinski's rule of five, indicating they might have favorable pharmacokinetic properties.

Good solubility ensures efficient absorption and distribution in the body78. ZINC98088375 had a high gastrointestinal absorption, which is preferred for oral medications but in the case of drugs targeting the vaginal compartment, gastrointestinal absorption may not be necessary. Usually, such infections with high G. vaginalis load are localized in the vagina and topical application (e.g., ointments, creams) is a more likely route of administration. Therefore, high predicted gastrointestinal absorption of ZINC98088375 is not a disadvantage in this case. ZINC72988191 demonstrated high solubility, its low gastrointestinal absorption and bioavailability score may pose challenges as good solubility is still desirable for topical drugs as it ensures proper distribution within the formulation and potential for sustained release. Its toxicity values were also higher so structural modification may be required. However, its excellent synthetic accessibility suggests a potential for optimization. ZINC5113880, and ZINC64746551 showed similar properties and their good synthetic accessibility also offers an opportunity for structural modifications to improve their properties. ZINC98088375 appears particularly promising due to its favorable pharmacokinetic profile, while others may require optimization to enhance their properties for drug development. Synthetic accessibility ranged from 3.04 to 3.89, indicating the feasibility of manufacturing these molecules on a commercial scale, which is a significant factor in drug development.

Additionally, understanding the energetics of ligand-receptor interactions is crucial for identifying and optimizing potential drug candidates79. The implications of the comparison of the lowest energy values between ZINC73198807 and ZINC5113880 in terms of MMGBSA energy and ΔG are significant in this context. ZINC73198807 formed a more stable complex with DAHP synthase which suggests a strong physical interaction between the ligand and the protein. This is desirable because it can lead to a longer duration of action for the drug and prevent dissociation. However, ZINC5113880 has a lower ΔG, suggesting a thermodynamically more favorable binding. If a long-lasting effect is crucial, ZINC73198807 could be prioritized for further development. However, if rapid and specific binding is more important, ZINC5113880 might be a better starting point. Medicinal chemists can also use this information to further optimize the ligand structure. They might try to modify ZINC5113880 to improve its stability while maintaining good affinity, or vice versa for ZINC73198807. However, MMGBSA calculations provide valuable insights but they are approximations. In later stages of drug design, experimental techniques like surface plasmon resonance or isothermal titration calorimetry would be used to validate the in silico predictions80,81.

PBPK modeling provides valuable insights into how drug absorption and pharmacokinetics may be influenced by health conditions82. Shortlisted compounds exhibited varying absorption percentages and pharmacokinetic profiles across different health conditions. The kidneys are the primary organs for eliminating water-soluble drugs and their metabolites from the body while the liver plays a crucial role in metabolizing (breaking down) drugs before they are eliminated. Higher absorption due to impaired kidney for ZINC64746551, ZINC72988191, ZINC73198807, and ZINC98088375 was observed. The higher concentration of other drugs in the bloodstream due to steatosis or cirrhosis can potentially cause toxicity. The impact of health conditions on drug absorption and elimination varies depending on the specific drug and the severity of the condition. Some medications require dose adjustments in patients with compromised gastrointestinal, liver, or kidney function83,84,85. This highlights the need for individualized patient care and tailored treatment approaches based on the specific characteristics of each drug and patient population. The actual in vivo results may also differ from computational simulation results, so experiments are desired for validation. Although computational predictions are important for generating baseline information, their direct practical implementation is not feasible and this is a limitation of our study that it relied only on in silico docking simulations, which lack the complexity of a living organism. In vivo testing is also necessary to confirm the efficacy and safety observed in ADMET predictions. Since the target site is the vagina, local side effects, and vaginal microbiota disruption are essential considerations during drug development. In vitro studies using vaginal epithelial cell lines can help assess potential cytotoxicity (cell toxicity). Hence, further experimental validation and medicinal chemistry efforts are required to confirm the potential utility of these compounds against G. vaginalis infection. Microbiome studies should be conducted to evaluate the impact of these drugs on the delicate vaginal microbiome. The in silico screen also did not comprehensively assess the off-target effects of the identified inhibitors. Further evaluation is required to ensure they specifically target DAHP synthase and minimize unintended consequences. In addition to this, the study concentrated on DAHP synthase, neglecting potential interactions with other metabolic pathways. A more holistic approach might consider these interactions.

Conclusion

This study successfully identified 11 potential drug targets through a subtractive proteomics approach. DAHP synthase, a critical enzyme in amino acid biosynthesis, emerged as a promising target for further investigation. Docking simulations with an aroG inhibitor library from the ZINC database revealed that several compounds possessed favorable binding affinities, exceeding even the control phosphoenolpyruvate. These prioritized inhibitors displayed desirable drug-like properties, exhibiting solubility and a safety profile free of AMES mutagenicity, hepatotoxicity, and skin sensitization concerns. PBPK analysis showed that different health conditions, such as steatosis, cirrhosis, renal impairment, and pregnancy, can impact drug absorption rates. Patients with these conditions may experience altered pharmacokinetics compared to healthy individuals. These findings pave the way for further development of these candidate inhibitors. In vitro and in vivo studies are warranted to assess their efficacy against the target organism and validate their safety profile in a biological context. Ultimately, this research holds promise for the discovery of novel therapeutic agents against G. vaginalis infections by exploiting the DAHP synthase pathway.