Introduction

The coronavirus disease (COVID-19) is an acute respiratory tract disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and was first reported in December 2019 in Wuhan, China1. The disease was declared a pandemic by the World Health Organization (WHO) on March 11, 20202. Since then, it has spread to 218 countries and has infected more than 25 million people and claimed the lives of 852,000 people until September 2, 2020. The common symptoms observed in COVID-19 patients include fever, cough, fatigue, shortness of breath, and loss of smell3.

Epidemiological analyses have shown that the SARS-CoV-2 has a lower fatality rate (5%) but a higher transmissibility rate (2–2.5%) than those of the previously known coronaviruses Middle East respiratory syndrome (MERS) coronavirus (34.4% fatality, < 1% transmissibility) and severe acute respiratory syndrome (SARS) coronavirus (9.5% fatality, 1.7–1.9% transmissibility)4,5. Serial viral load analyses in COVID-19 patients using reverse transcriptase quantitative polymerase chain reaction (RT-qPCR) indicated that peak viral load was observed during the first week of symptom onset, with a median viral shedding period of 20 days6,7. Antibody production starts approximately 10 days after symptom onset. Cohort studies of COVID-19 patients associated old age, multiple organ dysfunction, and high blood coagulation activity on admission with increased odds of death. Sepsis, respiratory failure, acute respiratory distress syndrome (ARDS), heart failure, and septic shock were the commonly observed complications among the cohorts7,8. Chakraborty et al. have provided a detailed review regarding the diagnostic and proposed therapeutic options for COVID-19 treatment9,10,11,12.

Following the outbreak of the COVID-19 pandemic, several drug candidates from the repository of existing drugs have been tested for activity against SARS-CoV-213,14,15. The Food and Drug Administration (FDA) has also created a special emergency program, the Coronavirus Treatment Accelerated Program (CTAP), that has reviewed 270 trials and is currently monitoring more than 570 drug development programs in the planning stage16. A review of the currently available literature shows that several existing antiviral drugs that target the viral replicating mechanism are under investigation for the treatment of COVID-19. The list of antiviral drugs being tested for COVID-19 includes remdesivir, hydroxychloroquine, chloroquine, lopinavir, darunavir, baloxavir, imatinib, and favipiravir17. Immunomodulating drugs that reduce inflammatory responses such as corticosteroids, tocilizumab, ruxolitinib, infliximab, acalabrutinib, and azithromycin are also under clinical investigation18,19,20,21. Various adjunctive drugs such as vitamins C and D and antithrombotics are also being considered for COVID-19 treatment22,23,24.

The ritonavir–lopinavir drug combination (Kaletra) has been used for the treatment of hospitalized patients in China and its benefits have been noted by the WHO25. Phase 3 clinical trials are underway to evaluate the performance and safety of the influenza drug favipiravir26,27. Remdesivir, an RNA-dependent RNA polymerase inhibitor, has been identified as a potential therapeutic agent for COVID-19 based on in vitro studies of SARS-CoV-2 clinical isolates28,29. The FDA issued an Emergency Use Authorization (EUA) for the emergency use of the drug following promising results from a placebo-controlled randomized clinical trial of remdesivir for COVID-19 treatment (https://www.fda.gov/). In a clinical study, remdesivir showed effectiveness in reducing the recovery time in COVID-19 patients. However, the drug did not contribute to significant improvement in survival rates, and the efficiency of the drug in reducing viral load in patients remained unclear. The FDA has also authorized the emergency use of the anti-malarial drugs chloroquine and hydroxychloroquine for COVID-19 treatment30,31. However, the clinical efficacy of these drugs remains inconclusive.

Viruses that cause diseases in humans are known to encode one or more proteases that play important roles in the viral life cycle. Proteases are ideal drug targets for viral diseases as they are responsible for cleaving the viral polyprotein, thus continuing the viral replication process32,33. Protease inhibitors have been used in combination drug therapy in diseases where the virus developed resistance by mutation. This strategy of using combination therapy to combat drug resistance has been successfully used in the treatment of viral diseases such as acquired immunodeficiency syndrome, in which protease inhibitors were used in combination with nucleoside reverse transcriptase inhibitors34.

The SARS-CoV-2 replicase enzyme encodes two polyproteins, pp1a and pp1ab, that produce all functional polypeptide units responsible for replication and transcription. Polypeptides are released by the catalytic cleavage activity of 3CLpro at various subsites of the polyproteins. This cleavage process is known to be conserved in 3CLpro for all coronaviruses35,36. Due to the important role of 3LCpro in the viral replication process and the absence of a close homolog in humans, this protease has been regarded as a promising therapeutic target for COVID-19 treatment37. However, despite its potential, the quest for 3CLpro inhibitors feasible for therapeutic use against COVID-19 has been unsuccessful so far.

Computer-aided drug discovery (CADD) methodologies have emerged as powerful tools in the drug discovery process and have been used over the last decade to identify protein inhibitors and to study protein-drug interactions and protein–protein interactions15,38,39,40. Since the development of a candidate drug into an approved drug is a long and costly process, a combination of computational methodologies such as virtual screening, docking, molecular dynamics (MD) simulation, and binding free energy evaluation, serves as a promising alternative for identifying potential drug candidates from compound libraries41. Cava et al. studied the mechanism of the angiotensin-converting enzyme 2 (ACE2) and its co-expressed genes using gene expression profiles in silico and suggested several interesting potential drug candidates for COVID-1942. Wang et al. performed virtual screening of the approved drugs and of those that are in clinical trials and identified several existing drug candidates that showed high binding affinity against 3CLpro43. Zhang et al. used in silico screening to identify potential SARS-CoV-2 inhibitors from a repository of traditional Chinese medicines44. Liang et al. performed MD simulation to demonstrate the binding stability of an α-ketoamide inhibitor inside the SARS-CoV-2 main protease45.

In this rapidly evolving pandemic, repurposing existing drugs and evaluating commercially available inhibitors against the druggable targets of SARS-CoV-2 should be an effective strategy to accelerate the drug discovery process. Consequently, taking advantage of the availability of the X-ray crystal structure of 3CLpro in complex with the inhibitor N3 (PDB code 6LU7)36, we performed a docking-based virtual screening of the protease inhibitor database MEROPS46 (https://www.ebi.ac.uk/merops/) to identify potential 3CLpro inhibitors.

Molecular docking and dynamic simulations were carried out to study the binding interactions of the inhibitor compounds with 3CLpro47. Binding energy calculations were performed using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method to evaluate the binding affinity of the compounds and to identify residues important for binding with 3CLpro47,48. The results of the modeling study were carefully analyzed to identify commercially available potential 3CLpro inhibitors.

Results

The X-ray structures of the irreversible inhibitor N336 and the α-ketoamide inhibitor 13b49 in complex with 3CLpro were retrieved from the Research Collaboratory for Structural Bioinformatics (RCSB) database. In the cell-based study by Jin et al.36, N3 showed inhibitory activity against SARS-CoV-2 with a half-maximal effective concentration (EC50) value of 16.77 μM. However, N3 covalently binds to 3CLpro as an irreversible inhibitor, and its half-maximal inhibitory concentration (IC50) value could not be determined. The α-ketoamide inhibitor 13b showed an IC50 value of 0.67 μM for purified recombinant SARS-CoV-2 main protease 3CLpro and also showed inhibitory activity against COVID-19 with an EC50 value of 4 to 5 μM in human Calu-3 cells infected with SARS-CoV-249. The moderate inhibitory activity values of the existing inhibitors necessitate the development of high-affinity 3CLpro inhibitors.

The 3CLpro protease consists of three domains: domain 1 (residues 3–99), domain 2 (residues 100–182), and domain 3 (residues 199–307), as shown in Fig. S1 (Supplementary Information)49. Domains 1 and 2 comprise six-stranded antiparallel β-barrels with the substrate binding site at the intersection of the two domains. As shown in Fig. 1(a), the binding site is made up of subsites S1, S2, S3, S4, and S1′, which are represented based on the binding position of the substrate polyprotein36. Domains 2 and 3 are connected by a hinge region (residues 182–198), which contributes to the formation of the S3 and S4 subsites. Domain 3 consists of five α-helices arranged in a globular cluster and regulates the dimerization of 3CLpro. The tight dimerization of 3CLpro is necessary for its catalytic activity, as it leads to crucial conformational changes at the S1 subsite and subsequent binding of the substrate.

Virtual screening

The protease inhibitor dataset consisting of 2700 compounds was retrieved from the MEROPS database46. Using an in-house script, we retrieved the PubChem IDs and simplified molecular-input line-entry system (SMILES) structures of the compounds from the PubChem website. 3D structures were generated using the concord module in Sybyl-X 2.150 and the Open Babel package51. Virtual screening of the protease inhibitor dataset was performed with the Surflex-Dock52 program in Sybyl-X 2.1 and autodock vina. The Surflex-Dock program in Sybyl-X 2.1 uses a scoring function that includes hydrophobic, polar, repulsive, entropic, and solvation energy terms, whereas the autodock vina uses a scoring function based on steric, hydrophobic, and hydrogen bonding energy terms52,53. The protein file was prepared by stripping the water molecules and other heteroatoms present in it and then converting the file to pdbqt file format. The methods and parameters used for virtual screening were validated by redocking the crystal ligands N3 (PDB ID 6LU7) and 13b (PDB ID 6Y2F) into the receptor.

The total binding score from Surflex-Dock and the binding energy from autodock vina were collected and used to rank the compounds. The PubChem IDs of the 100 compounds that showed high total binding scores (Surflex-Dock) and binding energies (autodock vina) are shown in Supplementary Information Table S1. Based on the total score and binding energy, 32 compounds were selected and further studied using molecular docking, MD simulation, and free energy calculation methods. The Surflex-Dock binding scores and the autodock vina binding energies for the 32 compounds, including the reference compounds N3 and 13b, are presented in Table 1.

Molecular docking

Molecular docking of the selected 32 compounds was performed to study the binding interactions and to provide initial coordinates of the protein–ligand complexes for subsequent MD simulation studies. The X-ray crystal structure of 3CLpro (PDB ID 6LU7) provided by Jin et al.36 was used as the receptor for this study. The docking protocol was validated by redocking the crystal ligands N3 (6LU7) and 13b (6Y2F) into the receptor. The docking showed that N3 formed H-bond interactions with residues His41, Asn142, Glu166, and Gln189 of 3CLpro. Compound 13b showed interactions with Asn142, Gly143, Ser144, His163, and Glu166 with the binding site residues of 3CLpro. The binding interactions of the inhibitors with 3CLpro are shown in Supplementary Information Fig. S2. Both the X-ray structure and docked structure overlapped within a similar space inside the receptor. The docked pose of N3 overlapped with the pose in the X-ray crystal structure (PDB 6LU7) at a root mean square deviation (RMSD) value of 2.6 Å, whereas the docked pose of 13b and X-ray structure (PDB 6Y2F) showed an RMSD value of 1.8 Å. The overlaps between the autodock docked pose and the X-ray structure for both compounds N3 and 13b are shown in Fig. S5 (Supplementary Information).

The docking protocol used in docking the 3CLpro inhibitors was used to dock the selected 32 compounds. The resultant binding energy values of the 32 compounds are presented in Table 1. Binding conformations of the compounds were carefully selected based on the binding energy values and also based on important non-bonded interactions observed with 3CLpro. The protein–ligand complexes from the docking study were used as initial coordinates in the MD simulations.

Molecular dynamics simulation

GROMACS 201954 was used to perform classical MD simulations of the selected 32 protein–ligand complexes to study the dynamic binding interactions of the compounds with 3CLpro.

For a comparative study, we also performed MD simulations of the N3-3CLpro complex (PDB ID 6LU7) and the 13b-3CLpro complex (PDB ID 6Y2F). The observed H-bond interactions and hydrophobic interactions are shown in Figs. 1 and 2, respectively. The X-ray structure of N3-3CLpro showed H-bond interactions with Phe140, Gly143, His160, Glu166, Glu189, and Thr19036. The ligand N3 also formed a covalent bond with Cys145. However, this covalent bond with Cys145 was not observed in the MD simulation result because the standard force field (AMBER99SB) cannot account for the formation of covalent bonds. The inhibitor N3 formed H-bond interactions with Phe140, Gly143, Cys145, His163, and His164 at the S1 subsite of 3CLpro. H-bond interactions were also observed between N3 and the hinge residue Glu192 near the S4 subsite. Isopropyl moieties of N3 were seen at the adjacent hydrophobic subsites S2 and S3. The benzene moiety of N3 was observed near the S1′ subsite, as seen in the crystal structure (6LU7). The binding interactions of N3 with 3CLpro are shown in Fig. 1(b). Similar binding patterns were also observed in the α-ketoamide compound 13b, where the moiety between the benzene ring and the pyridone ring formed H-bond interactions with His41, Asn142, Cys145, and His164 at the S1 subsite. The benzene ring of 13b also formed hydrophobic interactions with residues at the S1′ subsite. The pyridone ring formed H-bond interactions with Glu166 of β11 and the cyclopropyl moiety extended into the small hydrophobic subsite S2. The tert-butyloxycarbonyl protecting (Boc) group of 13b did not fully extend into the S4 subsite to form interactions with the hinge residues, as observed in the N3-3CLpro complex, and instead formed hydrophobic interactions with Leu167 and Pro168 at β11. The binding interactions observed in the MD simulation of 13b-3CLpro are shown in Fig. 1(c). The least-square fit RMSD of the ligands N3 and α-ketoamide 13b during the 50 ns simulations are shown in Fig. 5(a) and (b). When compared with the respective crystal ligand poses, compounds N3 (Fig. 1a) and 13b (Fig. 1d) showed RMSD values of 1.5 Å and 2 Å, respectively. Similarly, the dynamic binding interactions of the 32 compounds with 3CLpro were also studied. The trajectories from the MD study were used to evaluate the free energy of binding for the selected 32 compounds. The hydrophobic interactions observed from the MD simulations of N3 and 13b are shown in Fig. 2.

Calculation of binding free energy

MM-PBSA based binding energy (BE) calculations were performed for the selected 32 protein–ligand complexes, followed by evaluation of the energy contribution of the individual residues. For a comparative study, we also calculated the BE and the BE distribution for both, N3-3CLpro and 13b-3CLpro complexes. Compounds N3 and 13b showed BE values of − 150 kJ/mol and − 99 kJ/mol, respectively. Calculation of the BE distribution identified residues that contributed highly to the total BE, as shown in Fig. 3. The binding site residue Met165 from the S2 subsite showed the highest BE contribution, that may be attributed to the hydrophobic interaction observed with the compounds N3 and 13b. Pro168 at β11 also showed a high BE contribution, that may be attributed to the hydrophobic interaction with the methylisoxazole of N3 and the Boc group of 13b.

The residues from the S1 subsite, namely Phe140, Asn142, Gly143, Ser144, and Cys145, and the residues from β11, namely His163, His164, Met165, Leu167, and Pro168, were involved in both H-bond interaction and hydrophobic interactions in N3-3CLpro as well as 13b-3CLpro complexes. Consequently, these residues showed relatively high BE contributions (Fig. 3). Trajectory analyses also showed that the binding interactions with S1, S2, and β11 were stable throughout the simulations. However, interactions at the S1′ and S4 subsites were transient, resulting in flexible movement as indicated by the flipping of the benzene ring at the S1′ subsite and the movement of Boc and methylisoxazole at the S4 subsite, as shown in Fig. S4 (Supplementary Information).

Following the calculation of the BE for the 32 compounds, 16 compounds showed total BE values higher than the BE of compound 13b (− 99 kJ/mol), suggesting potential inhibitory activity for 3CLpro. Compounds 53361968 (-151 kJ/mol) and 451415 (-150 kJ/mol) showed higher BE values than the potent inhibitor N3. We also observed that 12 compounds showed BE values in the range of − 98 kJ/mol and − 63 kJ/mol, suggesting a moderate binding affinity with 3CLpro. Compounds 21881944, 4322, 100997107, and 3451 showed positive BE values, possibly due to non-converging simulations. The total BE values of the compounds are presented in Table 1. Based on the MM-PBSA based BE evaluations, the residue energy contributions of 10 protein–ligand complexes with high binding affinity were analyzed, as shown in Table 2. Analysis of the energy decomposition results for the selected 10 compounds showed that residues Thr25, Leu27, His41, Asp48, Met49, Leu50, Leu141, Cys145, His164, Met167, Pro168, Asp187, Gln189, and Ala191 play important roles in the binding of the compounds with 3CLpro. The interactions with these residues were dominated by electrostatic and hydrophobic interactions (Table 3).

Discussion

We selected 10 compounds that showed high potential for 3CLpro inhibition based on the total binding free energy to analyze the structural features critical for binding with 3CLpro. The structures of the selected compounds are presented in Table 4. The binding interactions with 3CLpro and the RMSD values for the 10 compounds are shown in Figs. 4 and 5, respectively.

Compound 441243 formed H-bond interactions with Gln189 and His41, while forming multiple interactions at the S1 subsite with Asn142, Ser144, and Gly143. Compounds 451415 (− 150 kJ/mol) and 53361968 (− 151 kJ/mol), which showed relatively high binding energy values (Table 3), had a relatively less number of H-bond interactions. However, further analysis showed that these compounds had relatively high hydrophobic energy contributions, resulting in higher total binding energy values. The donor nitrogen atoms of compound 446837 formed two H-bond interactions: with His41 and Glu166. Compound 46178275 showed only one stable H-bond interaction with Glu166. However, the oxygen and donor nitrogen atoms of 46178275 near the hinge region could form transient interactions with Gln189 and Thr190. Compound 15942730 showed the highest number of H-bonds, forming multiple interactions with His41, Glu166, Gln189, and Gln192. In the binding energy analysis of the 15942730-3CLpro complex (Table 3), the contribution of the electrostatic component to the total binding energy was − 91 kJ/mol, which was higher than that of the other selected compounds. Compounds 9828551 and 644196 formed several H-bond interactions with residues from β11 and S1. Consequently, these two compounds showed high electrostatic energy terms in the binding energy calculations. Compound 134815261 formed H-bond interactions with Glu166 and Gly143. Compound 13231950 formed H-bond interactions with His41, Asn142, His164, and Gln189. The analyses suggested that compounds showing higher binding affinities with 3CLpro were able to form H-bond interactions with residues from multiple subsites and also showed higher number of hydrophobic interactions.

From the analyses of the binding interactions, we observed that the interactions of the compounds with the S1 subsite residues such as His41, Asn142, Gly143, Ser144, and the Glu166 residue of β11 were crucial for stable interaction with 3CLpro. These interactions with the S1 and β11 residues were also observed in experimental studies36,49. Interactions of the compounds at the S2 subsite were predominantly hydrophobic. Since the S2 subsite is a small hydrophobic pocket, compounds with substituents such as isopropyl and cyclopropyl, which can fit into the hydrophobic pocket, could be promising 3CLpro inhibitors, as in the cases of inhibitor 13b (Fig. 2b), compound 451415/aclarubicin, and 53361968/TMC-31091155 (Supplementary Information Fig. S3). It was also observed that compounds with substituents that extend into the S4 subsite tend to show higher binding energies. The compounds 451415/aclarubicin56 and 53361968, which formed H-bonds (Fig. 4b and d) and hydrophobic interactions (Supplementary Information Fig. S3) at the S4 subsite showed relatively high total BE of − 150 kJ/mol and − 151 kJ/mol, respectively, suggesting the importance of these interactions in 3CLpro inhibition. This observation was also made in the experimental study of inhibitor 13b, wherein removing substituents that extended into the S4 subsite reduced the activity value49. Having substituents that extend into the S4 subsite induced conformational changes at the hinge region between subunits 1 and 2, as noted by Zhang et al.49. However, the exact mechanism behind the conformational change leading to increased affinity for 3CLpro remains unclear. From interaction studies, it was observed that having substituents that form hydrophobic interactions at the S1′ subsite was important for binding with 3CLpro. These interactions at S1′ were dominated by hydrophobic interactions, as observed in the interactions of N3 and 13b (Fig. 2) and also in the cases of compounds 446837/KNI-76457 (− 115 kJ/mol), 5336196855 (− 150 kJ/mol), and 15942730/chemostatin58 (− 129 kJ/mol). However, in the case of compound 451415, that lacks an extended hydrophobic benzyl substituent, H-bond interaction was observed with Thr26 at the S1′ subsite (Fig. 4b). From these observations, we speculate that having substituents that can form hydrophobic and H-bond interactions with S1′ residues may increase the binding affinity since having both hydrophobic and H-bond interactions with 3CLpro closely emulates the substrate-binding pattern59. Binding energy decomposition for individual residues identified His41, Met49, Met165, and Glu189 as key locations. These residues were also identified as important hotspot residues by Wang et al.43 in a recent study. Additionally, analysis of BE decomposition also revealed that the residues Thr25, Leu27, Asp48, Leu50, Leu141, Cys145, His164, Leu167, Pro168, Asp187, and Ala191 were significant for the binding of the inhibitors with 3CLpro.

The absorption, distribution, metabolism, and excretion (ADMET) properties of the compounds were also evaluated using the pkCSM server60 and the results are presented in Table S2 (Supplementary Information). In the ADMET analyses, compounds that showed an intestinal absorption value of less than 30% were considered to have poor absorption rates. Except for compound 15942730, all the selected compounds showed reasonable intestinal absorption rates. Steady-state volume of distribution (VDss) represents the degree to which the compounds are distributed in the body rather than the plasma, and compounds with log (VDss) values greater than − 0.15 are considered to have a reasonable distribution rate. All the compounds in Supplementary Information Table S2 except 46178275 and 15942730 showed VDss values greater than − 0.15, indicating that the compounds have satisfactory distribution rates. Analyses of the metabolism results suggest that the compounds are poor cytochrome P450 inhibitors. Compounds with positive results for the CYP3A4 substrate test suggest that they can be metabolized by cytochrome P450. The selected compounds also showed a reasonable total clearance rate from the body, except compound 446837. The negative Ames toxicity test results suggest that the compounds have poor mutagenic potential.

Being in the middle of the COVID-19 pandemic, availability of these compounds is crucial for in vivo and in vitro experimental studies. Hence, we checked the availability of the selected compounds in commercial libraries by referencing vendor data through the PubChem website and the ZINC database61. The details regarding the ZINC ID and the distributor (vendor) of the compounds are provided in Table 5. Compound 441243/saquinavir62 is an antiretroviral protease inhibitor approved by the FDA for the treatment of human immunodeficiency virus (HIV) infection. Recently, several computational studies have also reported the encouraging binding ability of saquinavir with 3CLpro25,63. The high binding affinity of saquinavir observed in our study, as well as other by independent research groups, indicates its potential as a 3CLpro inhibitor. The registry of clinical trials maintained by the United States National Library of Medicine under the National Institute of Health (NIH) showed that compound 451415/aclarubicin is an anthracycline drug and has been under evaluation (phase 2 clinical trial) for combination therapy against acute myeloid leukemia (AML). Compound 53361968, which showed the highest BE value (− 151 kJ/mol), is an investigational protease inhibitor that is currently being studied for HIV-1 infection treatment64. Compound 46178275/faldaprevir is a hepatitis C virus protease inhibitor currently being studied for the treatment of hepatitis C65. Additionally, compounds 132531950 (− 114 kJ/mol), 102285029 (− 111 kJ/mol), 11962092 (− 108 kJ/mol), 446918 (− 108 kJ/mol), 92727 (− 104 kJ/mol), and 45358152 (− 102 kJ/mol) also showed BE values greater than that of the inhibitor in the X-ray structure (α-ketoamide with a BE value of − 99 kJ/mol), suggesting that these compounds may have higher inhibitory activity against 3CLpro.

Conclusion

In this ongoing COVID-19 pandemic, CADD methodologies can be used effectively to accelerate the process of developing therapeutic agents for the treatment of this disease. In this study, we used docking-based virtual screening to search the protease inhibitor database (MEROPS) to identify potential inhibitors of the SARS-CoV-2 main protease 3CLpro. Molecular docking and dynamics simulations were carried out to study the binding interactions. Binding free energy calculations were performed to identify potential 3CLpro inhibitors. The study identified saquinavir, which is an approved drug for HIV-1 treatment, and several other investigational drugs, such as aclarubicin, TMC-310911, and faldaprevir. We also assessed the commercial availability of the compounds, which could be useful for experimental researchers. Analysis of the binding interactions revealed that electrostatic interactions with residues from the S1 subsite and the β-strand (β11) were important for the inhibition of 3CLpro. Compounds possessing substituents that extend into the S4 subsite induced conformational changes at the hinge between subunit 1 and subunit 2 and showed higher binding affinity. Compounds with high binding energies showed either hydrophobic or electrostatic interactions at the S1′ subsite. These structural features may be harnessed to design potent 3CLpro inhibitors. Using CADD methods, we identified 15 compounds with a binding affinity greater than that of the inhibitor inside 3CLpro in the X-ray structure (α-ketoamide). We suggest further experimental investigation of these compounds.

Methods

Data preparation

The X-ray crystal structure of 3CLpro in complex with the inhibitor N3 (PDB ID 6LU7) prepared by Jin et al. was used as the receptor for our study36. The heteroatoms and water molecules were removed from the protein file for further study.

A total of 2700 protease inhibitors were collected from the MEROPS database. MEROPS is a database of proteases and their inhibitors46. The two-dimensional (2D) structures provided in the simplified molecular-input line-entry system (SMILES) format and the PubChem IDs of the compounds were collected from the PubChem website (https://pubchem.ncbi.nlm.nih.gov/) using an in-house script. The 2D structures were converted to three-dimensional (3D) structures using the concord module in Sybyl-X 2.150.

Virtual screening

Docking-based virtual screening was performed using the Surflex-Dock module52 in Sybyl-X 2.1 and the autodock vina53 program to identify potential 3CLpro inhibitors. Since Surflex-Dock and autodock vina use different approaches in scoring the binding affinity of the compounds, using both methods increased the credibility of the virtual screening results.

Surflex-Dock

The protein structure was prepared by adding hydrogen atoms and assigning Amber 7FF99 atom types, followed by brief energy minimization. During ligand preparation, a general cleanup process was carried out by filling valences and removing duplicates and compounds that are not drug-like. A computational representation of the binding site, called the protomol, was generated based on the crystal ligand coordinate, as shown in Fig. S1 (Supplementary Information). The protomol was used to direct the initial placement of the ligands during docking. Virtual screening was carried out via the Surflex-Dock program using a molecular similarity-based search engine. Binding interactions were evaluated using an empirical scoring function based on hydrophobic, polar, repulsive, entropic, and solvation energy terms.

Autodock vina

3D coordinates of the compounds bearing partial charges were generated and saved in the pdbqt format. The receptor coordinates and grid parameters were generated using autodock tools66. The virtual screening process and the analysis of the results were performed using in-house scripts that incorporated the autodock vina program. The binding energies of the compounds were analyzed and used to rank the compounds.

Molecular docking

Molecular docking was performed to evaluate the binding energy and to provide initial coordinates and topology parameters for the MD simulations. The docking procedure was validated by extracting the irreversible inhibitor N336 (PDB ID 6LU7) and the α-ketoamide inhibitor 13b49 (PDB ID 6Y2F) from the crystal structures and docking them back into the receptor.

During the molecular docking of the compounds, the binding pose of the selected compounds from the virtual screening was used as the input. Polar atoms were added to the protein and Kollman charges were added as partial charges. A grid box with dimensions 60 × 60 × 60 centered at the coordinates X =  − 10, Y = 13, and Z = 70 was used to represent the search area. The Lamarckian genetic algorithm (LGA) was used to perform the docking process, generating 100 conformations for each compound. Based on the binding energy and binding interactions with the receptor, a representative binding pose for the ligands was selected.

Molecular dynamics simulation

Classical MD simulations were carried out on selected compounds, using GROMACS 201954, to evaluate their binding interactions with 3CLpro. The protein–ligand complexes from the docking study were used for the MD simulation study. The ligands were parameterized with the general amber force field (GAFF)67 using the Acpype program68. Protein topology and coordinate files were generated using the Amber99SB force field provided in GROMACS. The protein–ligand complex was contained in a dodecahedron and solvated with TIP3P water. Counter ions were added to neutralize the solvated system followed by quick energy minimization with the steepest descent minimization algorithm. This was followed by a restrained constant number of particles, volume, and temperature (NVT) ensemble equilibration for 500 ps and a constant number of particles, pressure, and temperature (NPT) ensemble for 1 ns equilibration. Thermodynamic properties such as pressure, density, potential energy, and temperature of the systems were monitored to ensure adequate equilibration before the production run. The particle mesh Ewald method was used to calculate the long-range electrostatics. Modified Berendsen thermostat and Parrinello-Rahman barostat were used for temperature and pressure coupling, respectively. Finally, unrestrained 50 ns production simulations were carried out for the systems at 310 K and 1 bar atmospheric pressure. The MD simulation procedure used here has been used in several protein–ligand interaction studies by our group and others39,40.

Calculation of binding free energy

The g_mmpbsa package developed by Kumari et al. was used to calculate the binding free energy or simply, the binding energy (BE) of the protein–ligand complexes. The g_mmpbsa program used subroutines sourced from GROMACS and APBS packages to integrate high-throughput molecular dynamics simulation with binding energy calculations48.

The vacuum potential energy was calculated from the bonded and non-bonded interactions based on the molecular mechanics (MM) force field. The electrostatic and van der Waals (Evdw) energy contributions were calculated based on Coulomb potential and Lennard–Jones potential functions, respectively. During the evaluation of the free energy of solvation, the polar contribution was calculated by solving the Poisson-Boltzmann equation. The non-polar contribution was calculated based on the assumption that the non-electrostatic solvation energy is linearly related to the solvent-accessible surface area (SASA). The non-polar energy term (Gnonpolar) includes both repulsive and attractive forces between the solute and solvent developed due to cavity (Gcavity) formation as well as the van der Waals interaction (GvdW). This can be represented by the equation below40,69.

$${\mathrm{G}}_{\mathrm{nonpolar}}={\mathrm{G}}_{\mathrm{cavity}}+{\mathrm{G}}_{\mathrm{vdW}}$$

During the calculation of the BE, snapshots were generated from the equilibrated region of the MD trajectory. Energy components were evaluated for 51 snapshots extracted every 0.1 ns from the trajectory. The decomposition of the energy term to individual residues was carried out using the MmPbsaDecomp.py script provided with the g-mmpbsa package. The default parameters set by Kumari et al. were used for all the calculations.

The Figs. 1, 4, S1, S2 and S4 were rendered using the pymol v2.1 software (https://pymol.org/2/). The Fig. 2 was generated using the LigPlot+ software (https://www.ebi.ac.uk/thornton-srv/software/LigPlus/).