Anthraquinolone and quinolizine derivatives as an alley of future treatment for COVID-19: an in silico machine learning hypothesis

Coronavirus disease 2019 (Covid-19), caused by novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), has come to the fore in Wuhan, China in December 2019 and has been spreading expeditiously all over the world due to its high transmissibility and pathogenicity. From the outbreak of COVID-19, many efforts are being made to find a way to fight this pandemic. More than 300 clinical trials are ongoing to investigate the potential therapeutic option for preventing/treating COVID-19. Considering the critical role of SARS-CoV-2 main protease (Mpro) in pathogenesis being primarily involved in polyprotein processing and virus maturation, it makes SARS-CoV-2 main protease (Mpro) as an attractive and promising antiviral target. Thus, in our study, we focused on SARS-CoV-2 main protease (Mpro), used machine learning algorithms and virtually screened small derivatives of anthraquinolone and quinolizine from PubChem that may act as potential inhibitor. Prioritisation of cavity atoms obtained through pharmacophore mapping and other physicochemical descriptors of the derivatives helped mapped important chemical features for ligand binding interaction and also for synergistic studies with molecular docking. Subsequently, these studies outcome were supported through simulation trajectories that further proved anthraquinolone and quinolizine derivatives as potential small molecules to be tested experimentally in treating COVID-19 patients.

www.nature.com/scientificreports/ The replicase gene of SARS-CoV-2 encodes for polyproteins, pp1a and pp1ab, which are required for viral replication and transcription. Initially, ORFs 1a and 1b are synthesized into two overlapping polyproteins, which are indistinguishable at the N-terminus but, pp1ab has a C-terminal extension because of frame-shift mutation. These polyproteins act as precursors of proteins in the transcription-replication complex. As functional polypeptides of the structural proteins (S, M, E and N), replicase and polymerase are released from polyproteins, the proteolytic process becomes very vital. This process is executed by a chymotrypsin-fold proteinase named, the Main protease (M pro ) 5,6 .
This chymotrypsin-like protease, termed M pro shares some similarities with the 3C proteases of Picornaviruses 3 . It also plays an important role in polyprotein processing and virus maturation, hence, it is considered to be an interesting target for antiviral drug designing as an approach towards treatment of COVID-19. Considering the viewpoint for drug designing, the M pro has been recommended as an enticing drug target due to its significance in the cleavage of the polyprotein into functional polypeptides 7 .
Studies have reported that M pro of all the coronaviruses are highly conserved with respect to their sequences and structures 8 . These features have together contributed in its functional importance, and have made M pro as an attractive target 8,9 . Upon intense screening of various chemical libraries, several small molecules were identified as potent SARS coronavirus protease inhibitors [10][11][12] .
Much studies have been reported in the direction of efficacy of antimalarial agents chloroquine and hydroxychloroquine for treating SARS-CoV-2 [13][14][15][16] . In the current study, we have used Anthraquinolone derivatives (AQ), known to exhibit antimalarial properties 17 and Quinolizine derivatives (QZ) which are used as repurposed drugs for treating Covid-19 18 . The efficacy of AQ and QZ derivatives as effective inhibitors of SARS-CoV-2 M pro was investigated using cutting-edge computational methods. The outline of the study has been demonstrated in Fig. 1.

Materials and methods
Machine learning-based virtual screening. In this study, we have used AQ and QZ derivatives for potential hits by virtual screening. The derivatives of AQ and QZ were screened and downloaded from PubChem 19 , a chemical compound repository consisting of more than 10 million records of compounds for virtual screening. The library consisted of around 28,000 QZ related derivatives and 100 AQ related derivatives. We have used The R Foundation for Statistical Computing,http:// www.r-proje ct. org/ found ation) with The Comprehensive R Archive Network (CRAN) version 3.5.3 to parse all the available chemical data and implement the machine learning algorithms for association of these datasets. ChemmineR (CRAN: ChemmineR and Rcdk (CRAN: Rcdk) were used to convert the SDF sets and chemical structures were assessed with respect to the stereochemistry, common functional groups, torsions and other salient parameters (R codes provided as supplementary material). Later each of these structural datasets were laid with significant cut-offs and all the compounds were screened and filtered based on similarity and Lipinski's rule 20,21 , a thumb rule to gauge if a chemical compound has a pharmacological or biological activity and also to check if it's an orally active drug in humans. For screening of these derivatives, 3D structures were downloaded in SDF format from PubChem. These SDF file were then converted to PDB files by OpenBable software 22 , which is mainly used for interconvert-  Molecular docking. Molecular Docking is a frequently used method in the structure based drug design.
It can be used to elucidate the interactions between a protein and a small molecule. Basically, it searches for an appropriate binding of the ligand that energetically and geometrically fits into the protein binding site. Molecular docking enables us to predict the intermolecular framework established between a protein and a small molecule. Further it recommends the binding poses responsible for inhibition of the protein.
We used AutoDockVina tool 1.5.6 24 , a molecular docking software, which provided an accessible interface for processing ligands and targets, polar hydrogen atoms and Gasteiger charges can also be easily added. The crystal structure of SARS-CoV-2 main protease (PDB ID: 6XA4) has been downloaded in PDB format from the Protein Data Bank (RCSB PDB,http:// www. rcsb. org/) 25  Pharmacophore modeling. This technique directly deals with 3D structure of Protein-Ligand complex. It helps to decide the interacting points in between protein and ligand by pinpointing appropriate ligand binding site of the protein 27 . To identify the pharmacophoric features of the top hits of AQ and QZ derivatives obtained ( Fig. 3) we have used LigandScout-4.4.5 build 20200714[i1_10], a computational tool which produces structure based pharmacophore models and explains the protein ligand interactions with discrete pharmacophoric features such as hydrophobic regions, hydrophilic regions, hydrogen bond donors, hydrogen bond acceptors etc. 28 .

Molecular Dynamics Simulation. Molecular Dynamics Simulation is a computationally exhaustive
method where we mimic the physiological conditions in which our protein resides, in order to observe the behavioural changes in them. It is based on Newton's second law of motion which defines that force exerted relies on the mass and acceleration of the atoms. Preliminary state of MDS is to create an initial preparatory state of protein which is followed by introduction to interaction potentials (i.e., energy minimization step) and equilibration of system (NPT/NVT ensembles) finally followed by production MD.
We used MD Simulation to validate stability of AQ/QZ derivatives with M pro protein as well as the interactions maintained between them in the physiological conditions with 300 K temperature, 1 bar pressure and pH 7. The system was surrounded by TIP3P water type enclosed in an orthorhombic box. The thermostat and barostat methods used were Nose-Hoover chain and Martyna-Tobias-Klein respectively with NPT type ensemble. The simulations were run for 50 ns chemical time. Upon completion of MD Simulations, RMSD (Root Mean Square Deviation) and RMSF (Root Mean Square Fluctuation) graphs were generated. All the simulation studies were performed using DESMOND 3.2 with maestro-v11.6 (D.E. Shaw Research) 29 .

Results and discussion
Developing broad-spectrum inhibitors of M pro is a distinctive strategy against SARS-CoV-2 infection, though; it entirely depends on the availability of a conserved target. When screening for a target of a potential inhibitor, all structural proteins such as E, M, N, and S were excluded as they had considerable discrepancy among different CoVs. Consecutively, the RNA-dependent RNA polymerase, RNA helicase, and M pro add up to an attractive drug targets along with some of the non-structural proteins. The pivotal roles played by SARS-CoV-2 M pro in directing      Table 1.

Molecular Dynamics Simulation of docked complexes. Root mean square deviation (RMSD) is a
computational method used to measure quantitative similarity of the atomic co-ordinates between the superimposed structures. It gives the measure of how much a protein confirmation has changed over the course of complete production run (50 ns). Root mean square fluctuation (RMSF) calculates the individual residue flexibility (fluctuation) in contrast to RMSD which calculates the positional differences in the entire structure. RMSF of a protein is plotted against residue number that indicates which amino acid contributes more in the motion of molecule. It was observed that the docked complex structures of AQ11 and AQ16 were stable post 5 ns chemical time with a constant interaction of protein and ligand over the complete simulation time. The AQ11-M pro and AQ16-M pro complexes were stable with RMSD of 1.8 Å and 2.1 Å respectively (Fig. 8). Similar kind of stable conformation was observed in QZ121-M pro complex with RMSD of 2.1 Å (Fig. 9).

Discussion
As the invasion of SARS-CoV-2 is extending with a worsening effect on world health due to the unavailability of potential drugs, the need of the hour is to find potential drugs against SARS-CoV-2. In our study, we chose SARS-CoV-2 M pro as a target due to its vital role in replication and transcription of viral proteins. We retrieved derivatives of anthraquinolone and quinolizine (FDA approved drugs used to treat malaria). These derivatives were then screened and the potential ones were subjected to molecular docking analysis to check its  To augment the study, we also did a comparative analysis of our potential hits with known reference compounds such as Ledipasvir, Irbesartan, and Venetoclax which are FDA and NIH approved drugs. These drugs have been docked against M pro by DockCoV2, a drug database for SARS-CoV-2, which calculates binding affinity of the interactions and they have highest binding affinity as compared to the other drugs in the database 30 . After an insightful comparison, data suggested that the binding affinity of our top hits was close to the above mentioned reference compounds, where Ledipasvir, Irbesartan and Venetoclax had binding affinity of − 10.2 kcal/ mol, − 9.8 kcal/mol, − 9.7 kcal/mol respectively. As per Lipinski's rule of five, our top hits satisfies the parameters by falling in the molecular weight range of less than 500 Daltons whereas the reference compounds had higher molecular weight which doesn't abide to the parameters. The adopted strategy encompasses to predict long term outcomes in terms of sorting the most potent ones, given the cost of clinical trials. We foresee that this in-silico study can be substantiated with the in-vitro and in-vivo analysis for making potential drugs /inhibitor of SARS-CoV-2 M pro .