Rational approach toward COVID-19 main protease inhibitors via molecular docking, molecular dynamics simulation and free energy calculation

In the rapidly evolving coronavirus disease (COVID-19) pandemic, repurposing existing drugs and evaluating commercially available inhibitors against druggable targets of the virus could be an effective strategy to accelerate the drug discovery process. The 3C-Like proteinase (3CLpro) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been identified as an important drug target due to its role in viral replication. The lack of a potent 3CLpro inhibitor and the availability of the X-ray crystal structure of 3CLpro (PDB-ID 6LU7) motivated us to perform computational studies to identify commercially available potential inhibitors. A combination of modeling studies was performed to identify potential 3CLpro inhibitors from the protease inhibitor database MEROPS (https://www.ebi.ac.uk/merops/index.shtml). Binding energy evaluation identified key residues for inhibitor design. We found 15 potential 3CLpro inhibitors with higher binding affinity than that of an α-ketoamide inhibitor determined via X-ray structure. Among them, saquinavir and three other investigational drugs aclarubicin, TMC-310911, and faldaprevir could be suggested as potential 3CLpro inhibitors. We recommend further experimental investigation of these compounds.


Results
The X-ray structures of the irreversible inhibitor N3 36 and the α-ketoamide inhibitor 13b 49 in complex with 3CL pro were retrieved from the Research Collaboratory for Structural Bioinformatics (RCSB) database. In the cell-based study by Jin et al. 36 , N3 showed inhibitory activity against SARS-CoV-2 with a half-maximal effective concentration (EC 50 ) value of 16.77 μM. However, N3 covalently binds to 3CL pro as an irreversible inhibitor, and its half-maximal inhibitory concentration (IC 50 ) value could not be determined. The α-ketoamide inhibitor 13b showed an IC 50 value of 0.67 μM for purified recombinant SARS-CoV-2 main protease 3CL pro and also showed inhibitory activity against COVID-19 with an EC 50 value of 4 to 5 μM in human Calu-3 cells infected with SARS-CoV-2 49 . The moderate inhibitory activity values of the existing inhibitors necessitate the development of high-affinity 3CL pro inhibitors.
The 3CL pro protease consists of three domains: domain 1 (residues 3-99), domain 2 (residues 100-182), and domain 3 (residues 199-307), as shown in Fig. S1 (Supplementary Information) 49 . Domains 1 and 2 comprise six-stranded antiparallel β-barrels with the substrate binding site at the intersection of the two domains. As shown in Fig. 1(a), the binding site is made up of subsites S1, S2, S3, S4, and S1′, which are represented based on Scientific Reports | (2020) 10:17716 | https://doi.org/10.1038/s41598-020-74468-0 www.nature.com/scientificreports/ the binding position of the substrate polyprotein 36 . Domains 2 and 3 are connected by a hinge region (residues 182-198), which contributes to the formation of the S3 and S4 subsites. Domain 3 consists of five α-helices arranged in a globular cluster and regulates the dimerization of 3CL pro . The tight dimerization of 3CL pro is necessary for its catalytic activity, as it leads to crucial conformational changes at the S1 subsite and subsequent binding of the substrate.
Virtual screening. The protease inhibitor dataset consisting of 2700 compounds was retrieved from the MEROPS database 46 . Using an in-house script, we retrieved the PubChem IDs and simplified molecular-input line-entry system (SMILES) structures of the compounds from the PubChem website. 3D structures were generated using the concord module in Sybyl-X 2.1 50 and the Open Babel package 51 . Virtual screening of the protease inhibitor dataset was performed with the Surflex-Dock 52 program in Sybyl-X 2.1 and autodock vina. The Surflex-Dock program in Sybyl-X 2.1 uses a scoring function that includes hydrophobic, polar, repulsive, entropic, and solvation energy terms, whereas the autodock vina uses a scoring function based on steric, hydrophobic, and hydrogen bonding energy terms 52,53 . The protein file was prepared by stripping the water molecules and other heteroatoms present in it and then converting the file to pdbqt file format. The methods and parameters used for virtual screening were validated by redocking the crystal ligands N3 (PDB ID 6LU7) and 13b (PDB ID 6Y2F) into the receptor. The total binding score from Surflex-Dock and the binding energy from autodock vina were collected and used to rank the compounds. The PubChem IDs of the 100 compounds that showed high total binding scores (Surflex-Dock) and binding energies (autodock vina) are shown in Supplementary Information Table S1. Based on the total score and binding energy, 32 compounds were selected and further studied using molecular docking, MD simulation, and free energy calculation methods. The Surflex-Dock binding scores and the autodock vina binding energies for the 32 compounds, including the reference compounds N3 and 13b, are presented in Table 1.
Molecular docking. Molecular docking of the selected 32 compounds was performed to study the binding interactions and to provide initial coordinates of the protein-ligand complexes for subsequent MD simulation studies. The X-ray crystal structure of 3CL pro (PDB ID 6LU7) provided by Jin et al. 36 was used as the receptor for this study. The docking protocol was validated by redocking the crystal ligands N3 (6LU7) and 13b (6Y2F) into the receptor. The docking showed that N3 formed H-bond interactions with residues His41, Asn142, Glu166, and Gln189 of 3CL pro . Compound 13b showed interactions with Asn142, Gly143, Ser144, His163, and Glu166 with the binding site residues of 3CL pro . The binding interactions of the inhibitors with 3CL pro are shown in Supplementary Information Fig. S2. Both the X-ray structure and docked structure overlapped within a similar space inside the receptor. The docked pose of N3 overlapped with the pose in the X-ray crystal structure (PDB 6LU7) at a root mean square deviation (RMSD) value of 2.6 Å, whereas the docked pose of 13b and X-ray structure (PDB 6Y2F) showed an RMSD value of 1.8 Å. The overlaps between the autodock docked pose and the X-ray structure for both compounds N3 and 13b are shown in Fig. S5 (Supplementary Information).
The docking protocol used in docking the 3CL pro inhibitors was used to dock the selected 32 compounds. The resultant binding energy values of the 32 compounds are presented in Table 1. Binding conformations of the compounds were carefully selected based on the binding energy values and also based on important nonbonded interactions observed with 3CL pro . The protein-ligand complexes from the docking study were used as initial coordinates in the MD simulations. Molecular dynamics simulation. GROMACS 2019 54 was used to perform classical MD simulations of the selected 32 protein-ligand complexes to study the dynamic binding interactions of the compounds with 3CL pro . www.nature.com/scientificreports/ For a comparative study, we also performed MD simulations of the N3-3CL pro complex (PDB ID 6LU7) and the 13b-3CL pro complex (PDB ID 6Y2F). The observed H-bond interactions and hydrophobic interactions are shown in Figs. 1 and 2, respectively. The X-ray structure of N3-3CL pro showed H-bond interactions with Phe140, Gly143, His160, Glu166, Glu189, and Thr190 36 . The ligand N3 also formed a covalent bond with Cys145. However, this covalent bond with Cys145 was not observed in the MD simulation result because the standard force field (AMBER99SB) cannot account for the formation of covalent bonds. The inhibitor N3 formed H-bond interactions with Phe140, Gly143, Cys145, His163, and His164 at the S1 subsite of 3CL pro . H-bond interactions were also observed between N3 and the hinge residue Glu192 near the S4 subsite. Isopropyl moieties of N3 were seen at the adjacent hydrophobic subsites S2 and S3. The benzene moiety of N3 was observed near the S1′ subsite, as seen in the crystal structure (6LU7). The binding interactions of N3 with 3CL pro are shown in Fig. 1(b). Similar binding patterns were also observed in the α-ketoamide compound 13b, where the moiety between the benzene ring and the pyridone ring formed H-bond interactions with His41, Asn142, Cys145, and His164 at the S1 subsite. The benzene ring of 13b also formed hydrophobic interactions with residues at the S1′ subsite. The pyridone ring formed H-bond interactions with Glu166 of β11 and the cyclopropyl moiety extended into the small hydrophobic subsite S2. The tert-butyloxycarbonyl protecting (Boc) group of 13b did not fully extend into the S4 subsite to form interactions with the hinge residues, as observed in the N3-3CL pro complex, and instead formed hydrophobic interactions with Leu167 and Pro168 at β11. The binding interactions observed in the MD simulation of 13b-3CL pro are shown in Fig. 1(c). The least-square fit RMSD of the ligands N3 and α-ketoamide 13b during the 50 ns simulations are shown in Fig. 5(a) and (b). When compared with the respective crystal ligand  Calculation of binding free energy. MM-PBSA based binding energy (BE) calculations were performed for the selected 32 protein-ligand complexes, followed by evaluation of the energy contribution of the individual residues. For a comparative study, we also calculated the BE and the BE distribution for both, N3-3CL pro and 13b-3CL pro complexes. Compounds N3 and 13b showed BE values of − 150 kJ/mol and − 99 kJ/mol, respectively. Calculation of the BE distribution identified residues that contributed highly to the total BE, as shown in Fig. 3. The binding site residue Met165 from the S2 subsite showed the highest BE contribution, that may be attributed to the hydrophobic interaction observed with the compounds N3 and 13b. Pro168 at β11 also showed a high BE contribution, that may be attributed to the hydrophobic interaction with the methylisoxazole of N3 and the Boc group of 13b. The residues from the S1 subsite, namely Phe140, Asn142, Gly143, Ser144, and Cys145, and the residues from β11, namely His163, His164, Met165, Leu167, and Pro168, were involved in both H-bond interaction and hydrophobic interactions in N3-3CL pro as well as 13b-3CL pro complexes. Consequently, these residues showed relatively high BE contributions (Fig. 3). Trajectory analyses also showed that the binding interactions with S1, S2, and β11 were stable throughout the simulations. However, interactions at the S1′ and S4 subsites were transient, resulting in flexible movement as indicated by the flipping of the benzene ring at the S1′ subsite and the movement of Boc and methylisoxazole at the S4 subsite, as shown in Fig. S4 (Supplementary Information).  www.nature.com/scientificreports/ Following the calculation of the BE for the 32 compounds, 16 compounds showed total BE values higher than the BE of compound 13b (− 99 kJ/mol), suggesting potential inhibitory activity for 3CL pro . Compounds 53361968 (-151 kJ/mol) and 451415 (-150 kJ/mol) showed higher BE values than the potent inhibitor N3. We also observed that 12 compounds showed BE values in the range of − 98 kJ/mol and − 63 kJ/mol, suggesting a moderate binding affinity with 3CL pro . Compounds 21881944, 4322, 100997107, and 3451 showed positive BE values, possibly due to non-converging simulations. The total BE values of the compounds are presented in Table 1. Based on the MM-PBSA based BE evaluations, the residue energy contributions of 10 protein-ligand complexes with high binding affinity were analyzed, as shown in Table 2. Analysis of the energy decomposition results for the selected 10 compounds showed that residues Thr25, Leu27, His41, Asp48, Met49, Leu50, Leu141, Cys145, His164, Met167, Pro168, Asp187, Gln189, and Ala191 play important roles in the binding of the compounds with 3CL pro . The interactions with these residues were dominated by electrostatic and hydrophobic interactions (Table 3).

Discussion
We selected 10 compounds that showed high potential for 3CL pro inhibition based on the total binding free energy to analyze the structural features critical for binding with 3CL pro . The structures of the selected compounds are presented in Table 4. The binding interactions with 3CL pro and the RMSD values for the 10 compounds are shown in Figs. 4 and 5, respectively. Compound 441243 formed H-bond interactions with Gln189 and His41, while forming multiple interactions at the S1 subsite with Asn142, Ser144, and Gly143. Compounds 451415 (− 150 kJ/mol) and 53361968 (− 151 kJ/mol), which showed relatively high binding energy values (Table 3), had a relatively less number of H-bond interactions. However, further analysis showed that these compounds had relatively high hydrophobic energy contributions, resulting in higher total binding energy values. The donor nitrogen atoms of compound 446837 formed two H-bond interactions: with His41 and Glu166. Compound 46178275 showed only one stable H-bond interaction with Glu166. However, the oxygen and donor nitrogen atoms of 46178275 near the hinge region could form transient interactions with Gln189 and Thr190. Compound 15942730 showed the highest number of H-bonds, forming multiple interactions with His41, Glu166, Gln189, and Gln192. In the binding energy analysis of the 15942730-3CL pro complex (Table 3), the contribution of the electrostatic component to the total binding energy was − 91 kJ/mol, which was higher than that of the other selected compounds. Compounds 9828551 and 644196 formed several H-bond interactions with residues from β11 and S1. Consequently, these two compounds showed high electrostatic energy terms in the binding energy calculations. Compound 134815261 formed H-bond interactions with Glu166 and Gly143. Compound 13231950 formed H-bond interactions with His41, Asn142, His164, and Gln189. The analyses suggested that compounds showing higher binding affinities with 3CL pro were able to form H-bond interactions with residues from multiple subsites and also showed higher number of hydrophobic interactions.
From the analyses of the binding interactions, we observed that the interactions of the compounds with the S1 subsite residues such as His41, Asn142, Gly143, Ser144, and the Glu166 residue of β11 were crucial for stable interaction with 3CL pro . These interactions with the S1 and β11 residues were also observed in experimental studies 36,49 . Interactions of the compounds at the S2 subsite were predominantly hydrophobic. Since the S2 subsite is a small hydrophobic pocket, compounds with substituents such as isopropyl and cyclopropyl, which can fit into the hydrophobic pocket, could be promising 3CL pro inhibitors, as in the cases of inhibitor 13b (Fig. 2b), compound 451415/aclarubicin, and 53361968/TMC-310911 55 ( Supplementary Information Fig. S3). It was also observed that compounds with substituents that extend into the S4 subsite tend to show higher binding energies. The compounds 451415/aclarubicin 56 and 53361968, which formed H-bonds ( Fig. 4b and d) and hydrophobic interactions ( Supplementary Information Fig. S3) at the S4 subsite showed relatively high total BE of − 150 kJ/mol and − 151 kJ/mol, respectively, suggesting the importance of these interactions in 3CL pro inhibition. This observation was also made in the experimental study of inhibitor 13b, wherein removing substituents that extended into the S4 subsite reduced the activity value 49 . Having substituents that extend into the S4 subsite induced conformational changes at the hinge region between subunits 1 and 2, as noted by Zhang et al. 49 . However, the exact mechanism behind the conformational change leading to increased affinity for 3CL pro remains unclear. From interaction studies, it was observed that having substituents that form hydrophobic interactions at the S1′ subsite was important for binding with 3CL pro . These interactions at S1′ were dominated by hydrophobic interactions, as observed in the interactions of N3 and 13b (Fig. 2) and also in the cases of compounds 446837/KNI-764 57 (− 115 kJ/mol), 53361968 55 (− 150 kJ/mol), and 15942730/chemostatin 58 (− 129 kJ/mol). However, in the case of compound 451415, that lacks an extended hydrophobic benzyl substituent, H-bond interaction was observed with Thr26 at the S1′ subsite (Fig. 4b). From these observations, we speculate that having substituents that can form hydrophobic and H-bond interactions with S1′ residues may increase the binding affinity since having both hydrophobic and H-bond interactions with 3CL pro closely emulates the substrate-binding pattern 59 . Binding energy decomposition for individual residues identified His41, Met49, Met165, and Glu189 as key locations. These residues were also identified as important hotspot residues by Wang et al. 43 in a recent study. Additionally, analysis of BE decomposition also revealed that the residues Thr25, Leu27, Asp48, Leu50, Leu141, Cys145, His164, Leu167, Pro168, Asp187, and Ala191 were significant for the binding of the inhibitors with 3CL pro .
The absorption, distribution, metabolism, and excretion (ADMET) properties of the compounds were also evaluated using the pkCSM server 60 and the results are presented in Table S2 (Supplementary Information). In the ADMET analyses, compounds that showed an intestinal absorption value of less than 30% were considered to have poor absorption rates. Except for compound 15942730, all the selected compounds showed reasonable intestinal absorption rates. Steady-state volume of distribution (VDss) represents the degree to which the compounds are distributed in the body rather than the plasma, and compounds with log (VDss) values greater than − 0.15 are considered to have a reasonable distribution rate. All the compounds in Supplementary Information Table S2 except 46178275 and 15942730 showed VDss values greater than − 0.15, indicating that the compounds have satisfactory distribution rates. Analyses of the metabolism results suggest that the compounds are poor cytochrome P450 inhibitors. Compounds with positive results for the CYP3A4 substrate test suggest that they can be metabolized by cytochrome P450. The selected compounds also showed a reasonable total clearance rate from the body, except compound 446837. The negative Ames toxicity test results suggest that the compounds have poor mutagenic potential.
Being in the middle of the COVID-19 pandemic, availability of these compounds is crucial for in vivo and in vitro experimental studies. Hence, we checked the availability of the selected compounds in commercial libraries by referencing vendor data through the PubChem website and the ZINC database 61 . The details regarding the ZINC ID and the distributor (vendor) of the compounds are provided in Table 5    showed BE values greater than that of the inhibitor in the X-ray structure (α-ketoamide with a BE value of − 99 kJ/mol), suggesting that these compounds may have higher inhibitory activity against 3CL pro .

Conclusion
In this ongoing COVID-19 pandemic, CADD methodologies can be used effectively to accelerate the process of developing therapeutic agents for the treatment of this disease. In this study, we used docking-based virtual screening to search the protease inhibitor database (MEROPS) to identify potential inhibitors of the SARS-CoV-2 main protease 3CL pro . Molecular docking and dynamics simulations were carried out to study the binding interactions. Binding free energy calculations were performed to identify potential 3CL pro inhibitors. The study identified saquinavir, which is an approved drug for HIV-1 treatment, and several other investigational drugs, such as aclarubicin, TMC-310911, and faldaprevir. We also assessed the commercial availability of the compounds, which could be useful for experimental researchers. Analysis of the binding interactions revealed that electrostatic interactions with residues from the S1 subsite and the β-strand (β11) were important for the inhibition of 3CL pro . Compounds possessing substituents that extend into the S4 subsite induced conformational changes at the hinge between subunit 1 and subunit 2 and showed higher binding affinity. Compounds with high binding energies showed either hydrophobic or electrostatic interactions at the S1′ subsite. These structural features may be harnessed to design potent 3CL pro inhibitors. Using CADD methods, we identified 15 compounds with a binding affinity greater than that of the inhibitor inside 3CL pro in the X-ray structure (α-ketoamide). We suggest further experimental investigation of these compounds.

Methods
Data preparation. The X-ray crystal structure of 3CL pro in complex with the inhibitor N3 (PDB ID 6LU7) prepared by Jin et al. was used as the receptor for our study 36 . The heteroatoms and water molecules were removed from the protein file for further study. A total of 2700 protease inhibitors were collected from the MEROPS database. MEROPS is a database of proteases and their inhibitors 46 . The two-dimensional (2D) structures provided in the simplified molecular-input line-entry system (SMILES) format and the PubChem IDs of the compounds were collected from the PubChem website (https ://pubch em.ncbi.nlm.nih.gov/) using an in-house script. The 2D structures were converted to three-dimensional (3D) structures using the concord module in Sybyl-X 2.1 50 .
Virtual screening. Docking-based virtual screening was performed using the Surflex-Dock module 52 in Sybyl-X 2.1 and the autodock vina 53 program to identify potential 3CL pro inhibitors. Since Surflex-Dock and autodock vina use different approaches in scoring the binding affinity of the compounds, using both methods increased the credibility of the virtual screening results.
Surflex-Dock. The protein structure was prepared by adding hydrogen atoms and assigning Amber 7FF99 atom types, followed by brief energy minimization. During ligand preparation, a general cleanup process was carried out by filling valences and removing duplicates and compounds that are not drug-like. A computational representation of the binding site, called the protomol, was generated based on the crystal ligand coordinate, www.nature.com/scientificreports/ as shown in Fig. S1 (Supplementary Information). The protomol was used to direct the initial placement of the ligands during docking. Virtual screening was carried out via the Surflex-Dock program using a molecular similarity-based search engine. Binding interactions were evaluated using an empirical scoring function based on hydrophobic, polar, repulsive, entropic, and solvation energy terms.
Autodock vina. 3D coordinates of the compounds bearing partial charges were generated and saved in the pdbqt format. The receptor coordinates and grid parameters were generated using autodock tools 66 . The virtual screening process and the analysis of the results were performed using in-house scripts that incorporated the autodock vina program. The binding energies of the compounds were analyzed and used to rank the compounds.
Molecular docking. Molecular docking was performed to evaluate the binding energy and to provide initial coordinates and topology parameters for the MD simulations. The docking procedure was validated by extracting the irreversible inhibitor N3 36 (PDB ID 6LU7) and the α-ketoamide inhibitor 13b 49 (PDB ID 6Y2F) from the crystal structures and docking them back into the receptor.
During the molecular docking of the compounds, the binding pose of the selected compounds from the virtual screening was used as the input. Polar atoms were added to the protein and Kollman charges were added as partial charges. A grid box with dimensions 60 × 60 × 60 centered at the coordinates X = − 10, Y = 13, and Z = 70 was used to represent the search area. The Lamarckian genetic algorithm (LGA) was used to perform the docking process, generating 100 conformations for each compound. Based on the binding energy and binding interactions with the receptor, a representative binding pose for the ligands was selected.

Molecular dynamics simulation.
Classical MD simulations were carried out on selected compounds, using GROMACS 2019 54 , to evaluate their binding interactions with 3CL pro . The protein-ligand complexes from the docking study were used for the MD simulation study. The ligands were parameterized with the general amber force field (GAFF) 67 using the Acpype program 68 . Protein topology and coordinate files were generated using the Amber99SB force field provided in GROMACS. The protein-ligand complex was contained in a dodecahedron and solvated with TIP3P water. Counter ions were added to neutralize the solvated system followed by quick energy minimization with the steepest descent minimization algorithm. This was followed by a restrained constant number of particles, volume, and temperature (NVT) ensemble equilibration for 500 ps and a constant number of particles, pressure, and temperature (NPT) ensemble for 1 ns equilibration. Thermodynamic properties such as pressure, density, potential energy, and temperature of the systems were monitored to ensure adequate equilibration before the production run. The particle mesh Ewald method was used to calculate the long-range electrostatics. Modified Berendsen thermostat and Parrinello-Rahman barostat were used for temperature and pressure coupling, respectively. Finally, unrestrained 50 ns production simulations were carried out for the systems at 310 K and 1 bar atmospheric pressure. The MD simulation procedure used here has been used in several protein-ligand interaction studies by our group and others 39,40 . Calculation of binding free energy. The g_mmpbsa package developed by Kumari et al. was used to calculate the binding free energy or simply, the binding energy (BE) of the protein-ligand complexes. The g_ mmpbsa program used subroutines sourced from GROMACS and APBS packages to integrate high-throughput molecular dynamics simulation with binding energy calculations 48 .
The vacuum potential energy was calculated from the bonded and non-bonded interactions based on the molecular mechanics (MM) force field. The electrostatic and van der Waals (E vdw ) energy contributions were calculated based on Coulomb potential and Lennard-Jones potential functions, respectively. During the evaluation of the free energy of solvation, the polar contribution was calculated by solving the Poisson-Boltzmann equation. The non-polar contribution was calculated based on the assumption that the non-electrostatic solvation energy is linearly related to the solvent-accessible surface area (SASA). The non-polar energy term (G nonpolar ) includes both repulsive and attractive forces between the solute and solvent developed due to cavity (G cavity ) formation as well as the van der Waals interaction (G vdW ). This can be represented by the equation below 40,69 .
During the calculation of the BE, snapshots were generated from the equilibrated region of the MD trajectory. Energy components were evaluated for 51 snapshots extracted every 0.1 ns from the trajectory. The decomposition of the energy term to individual residues was carried out using the MmPbsaDecomp.py script provided with the g-mmpbsa package. The default parameters set by Kumari et al. were used for all the calculations.

Data availability
All relevant data are contained within the manuscript and the supplementary material. Additional raw data will be available upon request.