Drug repositioning by merging active subnetworks validated in cancer and COVID-19

Computational drug repositioning aims at ranking and selecting existing drugs for novel diseases or novel use in old diseases. In silico drug screening has the potential for speeding up considerably the shortlisting of promising candidates in response to outbreaks of diseases such as COVID-19 for which no satisfactory cure has yet been found. We describe DrugMerge as a methodology for preclinical computational drug repositioning based on merging multiple drug rankings obtained with an ensemble of disease active subnetworks. DrugMerge uses differential transcriptomic data on drugs and diseases in the context of a large gene co-expression network. Experiments with four benchmark diseases demonstrate that our method detects in first position drugs in clinical use for the specified disease, in all four cases. Application of DrugMerge to COVID-19 found rankings with many drugs currently in clinical trials for COVID-19 in top positions, thus showing that DrugMerge can mimic human expert judgment.

thigh muscle, and primary hepatocytes. In total, it includes 7876 terms and 656 unique drugs. The Connectivity Map (also called CMAP) [4], [5] is a collection of gene expression data from five different human cells perturbed with many chemicals and genetic reagents. In total, CMAP applied 1309 compounds yielding 6100 profiles. The Connectivity Map project entered a most recent version of CMAP, as part of NIH's Library of Integrated Network-Based Cellular Signatures (LINCS) program, called LINCS-L1000. It comprises ∼ 5000 genetic perturbations and ∼ 15000 perturbations induced by chemical compounds [6] across 98 different cell lines. The two CMAP projects differ also in sequencing platforms. The LINCS-L1000 project replaced the Affymetrix GeneChips used by the original CMAP with Luminex bead arrays [7], which has been developed to facilitate rapid, flexible, and high-throughput gene expression profiling at lower costs. Using the enrichR package, in the L1000 database, we obtained 33132 terms and 4117 compounds.

Drug name and aliases
We handle drug aliases by using the MESH 1 database. Moreover, we identified commercial drug names with their principal active ingredient. Ambiguous cases were checked by hand against the pharmacological literature and DrugBank records (https://go.drugbank.com/).

CMAP algorithm
To compare the DrugMerge performance in the four benchmark diseases (asthma, rheumatoid arthritis, prostate cancer, and colorectal cancer), we used the PharmacoGx [8] R package, which performs the Connectivity Map (CMAP) analysis [4]. For each disease, we used the differentially expressed genes (DEGs) as gene input to identify drugs with therapeutic potential in each disease. In particular, we used the downloadPSet function to download the CMAP database, and the drugPertur-bationSig function to identify differential gene expressions induced by drug treatment. Finally, the connectivityScore function compares drug signatures against disease signatures (DEGs), by assigning connectivity score and p-value for each drug. The connectivity score determines the correlation between the drug and disease signatures and it ranges from −1 to 1. Since we are looking for disease treatments or drug repurposing, the CMAP drugs should be anti-correlated with disease signatures. For this reason, we ordered the final CMAP drugs according to the increasing order of the connectivity score (from negative to positive values). We also filtered the CMAP drugs according to p-value < 0.05 and we used both lists with or without p-value filtering. Starting from these lists, we calculated the RHR and precision@20 with z-score and p-values associated. After that, we selected those results with the best values of RHR and compare them with values detected by DrugMerge as reported in Figure 4 of the main text and Table S6.  Table S1 show that DrugMerge on the GEO Drug dataset finds one clinically relevant drug (prednisolone) in first position both when all drugs are used and when only FDA approved drugs are used. The precision@20 is not statistically significant, but the RHR of this single hit is very significant. A closer look at the GEO data and at the TTD records reveals that prednisolone is the only drug from TTD present in the GEO dataset. The initial list of differentially expressed genes is measured from a cohort of patients with acute asthma with respect to healthy patients, and interestingly the hit drug prednisolone is used specifically for acute asthma 2 . The DrugMerge algorithm does not find any hit in the L1000 drug dataset. An examination of the L1000 1 https://www.nlm.nih.gov/mesh/ 2 https://www.mayoclinic.org/diseases-conditions/asthma/in-depth/asthma-medications/art-20045557 data shows that only 5 asthma-related drugs as reported by TTD are in the L1000 data (budesonide, fluticasone propionate, mometasone furoate, beclomethasone, and colforsin) which are used to treat chronic asthma or mild/moderate persistent asthma. Similar considerations hold for the TTD drugs present in CMAP Drug data. Thus DrugMerge appears to be able to differentiate the acute vs chronic forms of the asthma disease and rank higher drugs that are fit for the subclass of asthma patients assessed in the data set (acute asthma patients).

DrugMerge results on Asthma
Among the top positions in the ranking reported in Supplementary File S7, we find drugs with known effects in asthma patients or animal models of asthma, and one of these went into clinical trial stage: deferasirox [9], rosiglitazone [10], valproic acid 3 , dexamethasone [11], celecoxib [12], and tamoxifen [13]. Rosiglitazone has been tested on a murine model of chronic asthma [10], suggesting that its intranasal administration can prevent air way inflammation. A comparison study between dexamethasone versus hydrocortisone in severe acute pediatric asthma [11] showed that the mean length of hospitalization in children receiving dexamethasone was significantly shorter than those receiving hydrocortisone. Celecoxib is a COX-2 inhibitor and nonsteroidal anti-inflammatory drug. A study of 33 asthma patients [12] demonstrated that celecoxib is a suitable drug in aspirin-induced and/or nonsteroidal anti-inflammatory drug-induced asthma patients. Tamoxifen is an estrogen receptor modulator mainly used for the treatment of breast cancer, but it has also been reported to have anti-inflammatory activity. A recent study [13] has tested tamoxifen on three different mouse models, showing that tamoxifen can reduce inflammatory infiltration of neutrophils in the airways.
For the remaining drugs in the top positions, we focus on their mechanism of action and we search in literature if this can have relevance in asthma. Using this criterion, we find captopril, letrozole, decitabine, imatinib, and probucol as relevant drugs. Captopril belongs to the class of drugs known as angiotensin-converting enzyme (ACE) inhibitors and is used primarily to lower high blood pressure (hypertension). It has been proved that asthmatic subjects with comorbid hypertension display evidence of enhanced asthma morbidity [14]. Letrozole inhibits aromatase, an enzyme that catalyzes the synthesis of estrogen. Asthma prevalence and severity are greater in women than in men, suggesting this is in part related to female steroid sex hormones, such as estrogen. Estrogen receptors are found on numerous immunoregulatory cells and estrogen's actions skew immune responses toward allergy [15]. Decitabine is a hypomethylating agent and is used to treat myelodysplastic syndromes and acute myeloid leukemia. Yang et al. [16] demonstrated that DNA methylation in specific gene loci are associated with asthma and suggest that epigenetic changes might play a role in establishing the immune phenotype associated with childhood asthma. Imatinib is a tyrosine kinase inhibitor. Anti-inflammatory effects of tyrosine kinase inhibitors have been reported in animal models of allergic asthma, suggesting that this kind of drug can be a very attractive strategy for the treatment of asthma [17]. Finally, probucol lowers the level of cholesterol through the inhibition of cholesterol synthesis and deletion of cholesterol absorption. Ramaraju et al. [18] found a modest but significant association between higher levels of serum cholesterol and asthma.  Table S2 show that DrugMerge finds significant rankings in several drug databases (L1000, DrugMatrix, GEO), with GEO attaining the best p-value, using several algorithms, both when all drugs are considered or when only FDA-approved drugs are considered. For the GEO dataset Supplementary Table S7, we find in first position the clinically relevant drug dexamethasone. Other drugs in use according to the TTD records that appear in the top twenty positions are: imatinib, methylprednisolone, and prednisolone.
Among the top positions in the ranking, we find drugs with known effects in rheumatoid arthritis patients or animal models of rheumatoid arthritis, and one of these went into clinical trial stage: rosiglitazone 4 , pioglitazone [20], estradiol [21], captopril [22], vitamin c [23], baclofen [24], deferasirox [25], decitabine [26], sirolimus [27], hydrocortisone [28], paclitaxel [29], and ethinylestradiol [30]. Suke et al. [20] studied the effects of combined pioglitazone and prednisolone on adjuvant-induced arthritis in rats. This study suggested that the combination of these two drugs was effective in modulating the inflammatory response and suppress arthritis progression. Estradiol is an estrogen steroid female hormone, and estrogens have a direct action upon the immune system. The role of the estrogens in rheumatoid arthritis has been studied here [21]. Captopril is prescribed for hypertension but it has immunosuppressant activity, as well. Therefore, captopril was considered a potential slow-acting drug for treating rheumatoid arthritis as demonstrated in [22]. Vitamin C is a vitamin found in various foods and it is important for immune system function. It has also been studied the role of the vitamin c in treating pain [31], in particular, an administration of high-dose vitamin C in patients with rheumatoid arthritis showed a complete decrease in pain [23]. Baclofen is a medication used to treat muscle spasticity. Huang et al. [24] investigated the effects of baclofen in murine collagen-induced arthritis, proving that baclofen alleviated the clinical development of arthritis. Decitabine has already been found in asthma analysis and as mentioned before, it inhibits DNA methylation. Petralia et al. [26] have studied the effect of decitabine in a murine model of rheumatoid arthritis and have demonstrated that decitabine administration was associated with a significant improvement of the clinical condition. Sirolimus, also known as rapamycin, has immunosuppressant functions and is used to prevent rejection in organ transplants. Wen et al. [27] studied the safety, tolerance, and efficacy of sirolimus in patients with active RA treated with low-dose sirolimus combined with original therapy. They showed that this therapy alleviates clinical symptoms and decreases the immunosuppressive applications in patients with active RA. Hydrocortisone is a treatment for acute episodes of rheumatic disorders, including rheumatoid arthritis 5 . Paclitaxel is an anticancer agent and is classified as a microtubule-stabilizing agent. Kurose et al. [29] studied the effects of paclitaxel on cultured synovial cells from patients with rheumatoid arthritis. The data suggest paclitaxel as a possible therapy for RA. Ethinylestradiol is an active estrogen and component of birth control pills. Subramaniam et al. [30] studied the effectiveness of ethinylestradiol in treating collagen-induced arthritis mice, noticing a decreased proliferation and secretion of pro-inflammatory factors.
Focusing on the action mechanism of the remaining drugs, we find some links with rheumatoid arthritis, such as amoxicillin [32], and niacin [33]. Amoxicillin is an antibiotic used to treat several bacterial infections. Since the 1930s, RA has been treated with antibiotics and there have been several reports in the literature indicating that periodontal pathogens are a possible cause of the disease [32]. Niacin, more commonly known as vitamin B3, is a precursor of the coenzymes nicotinamide-adenine dinucleotide (NAD+). Recent studies have identified potential therapeutic approaches for boosting NAD+ to treat rheumatologic diseases, including rheumatoid arthritis. In particular, they focused on the enzymatic activity of CD38, one of the main enzymes in NAD+ catabolism [33].
For the four benchmark diseases considered in our study, studies on repurposed drugs for tumors are reported in [46], [47], [48], for asthma in [49], and for rheumatoid arthritis in [50], and [51].
A survey focusing on repurposing anti-cancer drugs for covid19 is in [52]. Computational drug repositioning is gaining much attention during the current covid19 pandemic as reported in [53].
Network-based drug repositioning [54] is an approach that leverages on building and analyzing various types of biological networks integrating several layers of 'omic' data [55]. Also, drug perturbation data bases play a central role in computational drug profiling [56], [4], [57], [3]. Next, we focus on some results in the lines of research that are most relevant for our study.
Taguchi et al. [58] apply an unsupervised method based on tensor decomposition to perform feature extraction of gene expression profiles in multiple lung cancer cell lines infected with severe acute respiratory syndrome coronavirus 2 [59]. They thus identified, using Enrichr [60], drug candidates that significantly altered the expression of the 163 genes selected in the previous phase. Drug perturbation data is collected from GEO, DrugMatrix, L1000, and other repositories. Ruiz et al. [61] build a multi-layer disease-gene-drug-pathway network and develop a randomwalk-based score that captures indirect effects of drugs on diseases via commonly affected pathways. They validate their approach in a leave-one-out validation on a golden standard of 6000 drug-disease pairs commonly used in clinical practice. The paper, however, does not report on the specific application of this approach to COVID-19.
Fiscon et al. [62], [63] use an approach based on computing distances between regions of a large human interactome network affected by a disease (disease targets) and those affected by drugs (drug perturbations). This work relies on disease target similarities between a group of diseases including covid19, SARS-coV, MERS, and others. The final assessment of drug candidates for COVID19 is done with the C-map database.
Gysi et al. [64] define several proximity-based distance functions between covid19 human target proteins (as listed in [65]) and drug protein targets as listed in DrugBank [66], in order to prioritize repurposable drugs. Several refinement taking into account tissue specificity, drug action on gene expression levels (using data from [59]), comorbidities, and drug toxicity lead to a final list of 81 repurposable drugs for covid19. The main measure of performance chosen is the AUC of the predicted list versus the list of drugs currently employed in clinical trials for the treatment of COVID-19 as listed in https://www.clinicaltrials.gov/.
Our approach to performance evaluation is similar to the one in [64] since we also use drugs in clinical trials as the golden standard for covid19. Thus we are both measuring how well our automated ranking systems come close to the collective wisdom of human experts that shortlisted existing drugs for repurposing on COVID-19 during 2020 based on a large variety of considerations such as available clinical and preclinical data, pharmacological background, guesses mechanism of action. However, we decided to use precision@20 and the Reciprocal Hit Ranking as quantitative measures since they are quite intuitive, more suitable for a ranking problem, and they handle more uniformly lists of candidate drugs that can span from a few dozen drugs to a few thousand. Zhou et al. [67] also use proximity-based measures in biological networks to find a list of 16 repurposable drugs. It should be noticed that Zhou et al. use data collected from the family of human coronaviruses (HCoVs) to build the network, thus relying heavily on evolutionary conservation of relevant coding parts of the viral genome across the species in this family and on phylogenetic considerations.
Sadegh et al. [68] use the algorithm KeyPathwayMiner [69] to define the active subnetwork in an integrated COVID-19 resource network from CoronaVirus Explorer CoVex resource (https://exbio.wzw.tum.de/covex/). The main contribution of [69] is a resource (CoVex) that can be used interactively in many different scenarios and modalities to explore disease-gene-drug relationships for COVID-19.
Mall et al. [70] use deep learning induced vector embedding of drugs and viral proteins to predict drug-viral protein activity and propose a short list of 15 drugs potentially useful against COVID-19.
A second very popular approach uses knowledge of the 3-dimensional molecular configuration of proteins and drugs to simulate, via docking or molecular dynamics, the most promising drug-target bindings. Most research in docking-based drug repositioning starts with an assessment of one (or a few) protein acting as potential drug targets [71].
Seo et al. [72] take three dimensional structure of the main protease (Mpro) of SARS-CoV-2 as a target, and use docking simulations on a supercomputer to evaluate binding affinity between Mpro and drug candidates listed in the SWEETLEAD library and the ChEMBL database (19,168 molecules), thus shortlisting 43 drugs, which, after molecular dynamics simulations, are then reduced to 8. A similar docking-based screening is also used in [73], [74], [75], [76], [77], and [78].
A more direct experimental approach has been described in [65] and [94]. Gordon et al. [65] performed a direct proteomic assay to uncover 332 human proteins potentially interacting with 26 SARS-Cov2 proteins. Sixty-nine drugs targeting these proteins are then shortlisted as repurposable drugs for covid19, using a mix of chemo-informatic searches and expert advice. Note that in this setting a priority ranking is not provided.
A main theme along with drug ranking is that of drug combinations [95]. Often drug combinations offer lower toxicity and more effective disease treatment. Most drug combinations currently under consideration result from ad-hoc considerations, as screening drug combinations automatically incurs easily in a combinatorial explosion of cases to be considered. A few network-based principled approaches have been proposed and may be integrated into DrugMerge [96], [67], [97], [98], [99], [100], and [101].
While the final hallmark of success is the identification of a repurposed drug that can pass successfully all stages of drug approval for clinical use (see e.g. [102]), it is important to be able to perform in silico validation on intermediate results. A recent study by Brown and Patel [103] gives a critical assessment of the options for in silico drug validations, assessing the weak and strong points of each strategy.

Supplementary Tables
Supplementary Table S1. DrugMerge results on Asthma. The table reports the ranking function, the filter, drug perturbation data set, algorithmic configuration attaining best performance on the asthma DEG dataset. It is reported the reverse hits ranking and precision at 20, with corresponding z-scores and p-values.