A method for the rational selection of drug repurposing candidates from multimodal knowledge harmonization

The SARS-CoV-2 pandemic has challenged researchers at a global scale. The scientific community’s massive response has resulted in a flood of experiments, analyses, hypotheses, and publications, especially in the field of drug repurposing. However, many of the proposed therapeutic compounds obtained from SARS-CoV-2 specific assays are not in agreement and thus demonstrate the need for a singular source of COVID-19 related information from which a rational selection of drug repurposing candidates can be made. In this paper, we present the COVID-19 PHARMACOME, a comprehensive drug-target-mechanism graph generated from a compilation of 10 separate disease maps and sources of experimental data focused on SARS-CoV-2/COVID-19 pathophysiology. By applying our systematic approach, we were able to predict the synergistic effect of specific drug pairs, such as Remdesivir and Thioguanosine or Nelfinavir and Raloxifene, on SARS-CoV-2 infection. Experimental validation of our results demonstrate that our graph can be used to not only explore the involved mechanistic pathways, but also to identify novel combinations of drug repurposing candidates.

When taken together with related work including cause-and-effect modeling 8 , entity relationship graphs 16 , and pathways 8 ; these disease maps represent a considerable amount of highly curated "knowledge graphs" which focus primarily on COVID-19 biology. Here, we use the term "mechanism" to describe a single, or multiple cause-and-effect relationships (i.e. a subgraph), "pathways" to refer to a well-established series of interactions resulting in cellular change or a defined product, and "models" for describing a collection of experimental data or known interactions defined in the context of a particular biological process or pathology. As of July 2020, a collection consisting of 10 models representing core knowledge about the pathophysiology of SARS-CoV-2 and its primary target, the lung epithelium, was shared with the public.
With the rapidly increasing generation of data (e.g. transcriptome 17 , interactome 18 , and proteome 19 data), we are now in the position to challenge and validate these COVID-19 pathophysiology knowledge graphs with experimental data. This is of particular interest as validation of these knowledge graphs bears the potential to identify those disease mechanisms that are highly relevant for targeting in drug repurposing approaches.
The concept of drug repurposing (the secondary use of already developed drugs for therapeutic uses other than those they were designed for) is not new. The major advantage of drug repurposing over conventional drug development is the massive decrease in time required for development as important steps in the drug discovery workflow have already been successfully passed for these compounds 20,21 .
Our group and many others have already begun performing assays to screen for experimental compounds and approved drugs to serve as new therapeutics for COVID-19. Dedicated drug repurposing collections, such as the Broad Institute library 22 and the comprehensive ReFRAME library 23 were used to experimentally screen for either viral proteins as targets for functional inhibition 24 , or for virally infected cells in phenotypic assays 25 . In our own work, compounds were assessed for their inhibition of virus-induced cytotoxicity using the human cell line Caco-2 and a SARS-CoV-2 isolate 26 . A total of 63 compounds with IC50 < 20 µM were identified, from which 90% have not yet been previously reported as being active against SARS-CoV-2. Out of the active compounds, 31 are approved drugs, 23 are in phases 1-3 and 9 are preclinical candidate molecules. The described mechanisms of action for the inhibitors included kinase signaling, PDE activity modulation, and long chain acyl transferase inhibition (e.g. "azole class antifungals").
The approach presented here integrates experimental results, the output from other informatic pipelines, as well as proprietary and public data to provide a comprehensive overview on the therapeutic efficacy of candidate compounds, the mechanisms targeted by these candidate compounds, and a rational approach to test the drugmechanism associations for their potential in combination therapy.

Methodology
Generation of the COVID-19 PHARMACOME. Disparate COVID-19 disease maps focus on different aspects of COVID-19 pathophysiology. Based on comparisons of the COVID-19 knowledge graphs, we found that not a single disease map covers all aspects relevant for the understanding of the virus, host interaction and the resulting pathophysiology. Thus, we optimized the representation of integral COVID-19 pathophysiology mechanisms by integrating several public and proprietary COVID-19 knowledge graphs, disease maps, and experimental data (Supplementary Table 1) into one unified knowledge graph, the COVID-19 Supergraph.
To this end, we converted all knowledge graphs and interactomes into OpenBEL 27 , a language that is both ideally suited to capture and represent "cause-and-effect" relationships in biomedicine, and is fully interoperable with major pathway databases 28,29 . In order to ensure that molecular interactions were correctly normalized, individual pipelines were constructed for each model to convert the raw data to the OpenBEL format. For example, the COVID-19 Disease Map contained 16 separate files, each of which represented a specific biological focus of the virus. Each file was parsed individually and the entities and relationships that did not adhere to the OpenBEL grammar were mapped accordingly. Whilst most of the entities and relationships in the source disease maps could be readily translated into OpenBEL, a small number of triples from different source disease maps required a more in-depth transformation. When classic methods of naming objects in triples failed, the recently generated COVID-19 ontology 30 as well as other available standard ontologies and vocabularies were used to normalize and reference these entities.
In addition to combining the listed models, we also performed a dedicated curation of the COVID-19 supergraph in order to annotate the mechanisms pertaining to selected targets and the biology around prioritized repurposing candidates. The resulting BEL graphs were quality controlled and subsequently loaded into a dedicated graph database system underlying the Biomedical Knowledge Miner (BiKMi), which allows for comparison and extension of biomedical knowledge graphs (see http:// bikmi. covid 19-knowl edges pace. de).
Once the models were converted to OpenBEL and imported into the database, the resulting nodes from each mechanism-based model were compared (Fig. 1). Even when separated by data origin type, the COVID-19 knowledge graphs had very little overlap (3 shared nodes between all manually curated models and no shared nodes between all models derived from interaction databases), but by unifying the models, our COVID-19 supergraph improves the coverage of essential virus-and host-physiology mechanisms substantially.
Additionally, by enriching the COVID-19 supergraph with drug-target information linked from highly curated drug-target databases (DrugBank, ChEMBL, PubChem), we created an initial version of the COVID-19 PHARMACOME, a comprehensive drug-target-mechanism graph representing COVID-19 pathophysiology mechanisms that includes both drug targets and their ligands (Fig. 2). In order to maximize its utility, this network includes both experimentally validated drug-target relationships as well as a wide distribution of biological entities and concepts (Supplementary Figure 1). The entire COVID-19 PHARMACOME was manually Systematic review and integration of information from phenotypic screening. At the time of the writing of this paper, six phenotypic cellular screening experiments have been shared via archive servers and journal publications (Supplementary Table 2). Although only a limited number of these manuscripts have been officially accepted and published, we were able to extract their primary findings from the pre-publication archive servers. A significant number of reports on drug repurposing screenings in the COVID-19 context demonstrate how appealing the concept of drug repurposing is as a quick answer to the challenge of a global pandemic. Drug repurposing screenings were all performed with compounds for which a significant amount of information on safety in humans and primary mechanism of action is available. We generated a list of "hits" from cellular a b www.nature.com/scientificreports/ screening experiments while results derived from publications that reported on in-silico screening were ignored. Therefore, we keep a strict focus on well-characterized, well-understood candidate molecules since a pivotal advantage of this knowledge base is its use for drug repurposing.
Subgraph annotation. The COVID-19 PHARMACOME contains several subgraphs, three of which correspond to major views on the biology of SARS-CoV-2 as well as the clinical impact of COVID-19: • the viral life cycle subgraph focuses on the stages of viral infection, replication, and spreading.
• the host response subgraph represents essential mechanisms active in host cells infected by the virus.
• the clinical pathophysiology subgraph illustrates major pathophysiological processes of clinical relevance.
These subgraphs were annotated by identifying nodes within the COVID-19 PHARMACOME that represent specific biological processes or pathologies associated with each subgraph category and traversing out to their first-degree neighbors. For example, a biological process node representing "viral translation" would be classified as a starting node for the viral life cycle subgraph while a node defined as "defense response to virus" would be categorized as belonging to the host response subgraph. Though the viral life cycle and host response subgraphs contain a wide variety of node types, the pathophysiology subgraph is restricted to pathology nodes associated with either the SARS-CoV-2 virus or the COVID-19 pathology.
Mapping of gene expression data onto the COVID-19 PHARMACOME. Two single cell sequencing data sets representing infected and non-infected cells directly derived from human samples 31 and cultured human bronchial epithelial cells 32 (HBECs) were used to identify the areas of the COVID-19 PHARMACOME responding at gene expression level to SARS-CoV-2 infection. Details of the gene expression data processing and mapping are available in the supplementary material (see section "Gene expression data analysis").
Pathway enrichment. Associated pathways for subgraphs and significant targets were identified using the Enrichr 33 feature of the gseapy Python package 34 . Briefly, gene symbol lists were assembled from their respective subgraph or dataset and compared against multiple pathway gene set libraries including Reactome, KEGG, and WikiPathways. To account for multiple comparisons, p values were corrected using the Benjamini-Hochberg 35 method and results with p values < 0.01 were considered significantly enriched.
Drug repurposing screening. We performed phenotypic assays to screen for repurposing drugs that inhibit the replication and the cytopathic effects of virus infection. A derivative of the Broad repurposing library was used to incubate Caco-2 cells before infecting them with an isolate of SARS-CoV-2 (FFM-1 isolate, see 36 ). Survival of cells was assessed using a cell viability assay and measured by high-content imaging using the Operetta CLS platform (PerkinElmer). Details of the drug repurposing screening are described in the supplemental material.
Drug combinations assessment with anti-cytopathic effect measured in Caco-2 cells. As described in Ellinger et al., 37 we challenged four combinations of five different compounds with the SARS-CoV-2 virus in four 96-well plates containing two drugs each. Eight drug concentrations were chosen ranging from 20 to 0.01 µM, diluted by a factor of 3 and positioned orthogonally to each other in rows and columns. No pharmacological control was used, only cells with and without exposure to SARS CoV-2 virus at 0.01 MOI.
In addition, recently published data from the work of Bobrowski et al. 38 , were mapped to the COVID-19 PHARMACOME and compared to the results of the combinatorial treatment experiments performed here.

Results
Comparative analysis of the hits from different repurposing screenings. Data from six published drug repurposing screenings were downloaded, and extensive mapping and curation was performed in order to harmonize chemical identifiers. The curated list of drug repurposing "hits" together with an annotation of the assay conditions is available under http:// chembl. blogs pot. com/ 2020/ 05/ chemb l27-sars-cov-2-relea se. html.
Initially, we analyzed the overlap between compounds identified in the reported drug repurposing screening experiments. Figure 3a shows no overlap between experiments, which is not surprising, as we are comparing highly specific candidate drug experiments with screenings based on large drug repositioning libraries. However, the overlap is still quite marginal for those screenings where large compound collections (Broad library, ReFRAME library) have been used.
Mapping of repurposing hits to target proteins. In order to identify which proteins are targeted by the repurposing hits, and to investigate the extent to which there are overlaps between repurposing experiments at the target/protein level, we mapped all the identified compounds from the drug repurposing experiments to their respective targets. As most drugs bind to more than one target, we increase the likelihood of overlaps between the drug repurposing experiments when we compare them at the protein/target space. Indeed, Fig. 3b shows an overlap of 112 targets between all the drug repurposing experiments, thereby creating a list of potential proteins for therapeutic intervention when the compound targets are considered rather than the compounds themselves. www.nature.com/scientificreports/ The COVID-19 PHARMACOME associates pathways derived from drug repurposing targets with pathophysiology mechanisms. A non-redundant list of drug repurposing candidate molecules that display activity in phenotypic (cellular) assays was generated and mapped to the COVID-19 PHARMA-COME. Figure 4 shows the distribution of repurposing drugs in the COVID-19 cause-and-effect graph, the "responsive part" of the graph that is characterized by changes in gene expression associated with SARS-CoV-2 infection and the overlap between the two subgraphs. This overlap analysis allows for the identification of repurposing drugs targeting mechanisms that are modulated by viral infection. A total number of 870 mechanisms were identified as being targeted by most of the drug repurposing candidates (see section "Associated pathway identification" in supplementary materials). When compared to the annotated subgraphs in the COVID-19 PHARMACOME, 201 of the 227 determined associated pathways found for the viral life cycle subgraph overlapped with those for the drug repurposing targets while the host response subgraph shared 90 of its 105 pathways.
Mapping of drug repurposing signals to hypervariable regions of the COVID-19 PHARMA-COME. One of the key questions arising from the network analysis is whether the repurposing drugs target mechanisms are specifically activated during viral infection. In order to establish this link, we mapped differential gene expression analyses from two single-cell sequencing studies to our COVID-19 PHARMACOME. An overlay of differential gene expression data (adjusted p value ≤ 0.1 and abs(log fold-change) > 0.25) on the COVID-19 PHARMACOME reveals a distinct pattern characterized by the high responsiveness (expressed by variation of regulation of gene expression) to the viral infection (Fig. 4a).

Virus-response mechanisms are targets for repurposing drugs.
In the next step, we analyzed which areas of the COVID-19 graph respond to SARS-CoV-2 infection (indicated by significant variance in gene expression) and are targets for repurposing drugs. To this end, we mapped signals from the drug repurposing screenings to the subgraph that showed responsiveness to SARS-CoV-2 infection (Fig. 4b). Figure 4c depicts the resulting subgraph that is characterized by the transcriptional response to SARS-CoV-2 infection and the presence of target proteins of compounds that have been identified in drug repurposing screening experiments.
The COVID-19 PHARMACOME supports rational targeting strategies for COVID-19 combination therapy. We mapped existing combinatorial therapy data to the COVID-19 PHARMACOME in order to evaluate its potential in guiding rational approaches towards combination therapy using repurposing drug candidates. In drug combination therapy, the interaction between compounds can be defined as either additive (the combined effect is the same given proportional doses of the individual drugs), synergistic (the combined effect is larger than the additive effect of each individual drug), or antagonistic (the combined effect is smaller a b www.nature.com/scientificreports/ than the additive effect of each individual drug) 39,40 Combinatorial treatment data obtained from the results published by Bobrowski 41 and Ellinger et al 42 were mapped to the COVID-19 PHARMACOME. Figure 5 provides an overview of the mapped compounds, their protein targets, and the interaction mechanisms. Analysis of the overlaps between the drug repurposing screening data showed that four of the ten compounds reported in the synergistic treatment approach by drug repurposing data were represented in our initial non-redundant set of candidate repurposing drugs. Based on the association between repurposing drug candidates and the areas of the COVID-19 PHARMA-COME that respond to SARS-CoV-2 infection (Fig. 4), we hypothesized that the number of edges between a pair of drug nodes may be linked to the effectiveness of the drug combination (Supplementary Figure 2). In order to evaluate whether the determined outcome of a combination of drugs correlated with the distance between said drug nodes, we compared distances for combinations of drugs within the COVID-19 PHARMACOME for which their effect was known (Supplementary Tables 3 and 5). Of the 47 drug combinations we were able to check within the COVID-19 PHARMACOME, we found that the pairs of drugs known to have a synergistic effect in the treatment of SARS-CoV-2 had an average shortest path length of 2.43, while antagonistic combinations were found to be farther apart with an average shortest path length of 4.0 (Supplementary Table 7). Based on our calculations, we formulated three categories for predicting the outcome of new drug combinations on infection using the shortest path lengths between them within the COVID-19 PHARMACOME. Drug combinations with shortest path lengths of 2 indicate a synergistic relationship between the compounds, 3 was determined to be inconclusive as our calculations did not justify a specific outcome, and those with a shortest path length of 4 or more were predicted to have an antagonistic relationship.
In order to test our ability to predict the outcome of novel drug combinations, we selected five compounds: Remdesivir (a virus replicase inhibitor), Nelfinavir (a virus protease inhibitor), Raloxifene (a selective estrogen receptor modulator), Thioguanosine (a chemotherapy compound interfering with cell growth), and Anisomycin (a pleiotropic compound with several pharmacological activities, including inhibition of protein synthesis and nucleotide synthesis). These compounds were used in four different combinations (Remdesivir/Thioguanosine, Remdesivir/Raloxifene, Remdesivir/Anisomycin and Nelfinavir/Raloxifene) to test the potency of these drug pairings in phenotypic, cellular assays. Figure 6 shows the results of these combinatorial treatments on the virusinduced cytopathic effect in Caco-2 cells.
Our results indicate that compound combinations acting on different viral mechanisms, such as Remdesivir and Thioguanosine (Fig. 6b) or Nelfinavir and Raloxifene (Fig. 6d), showed synergy, while compounds www.nature.com/scientificreports/ acting on host mechanisms, for instance Anisomycin or Raloxifene, when combined with Remdesivir (Fig. 6a, c, respectively), resulted in neither synergistic nor additive effects. Interestingly, our experiments revealed that the HIV-Protease inhibitor Nelfinavir, which already appeared to be active against viral post-entry fusion steps of both SARS-CoV 43 and SARS-CoV-2 44 , displayed synergistic effects when combined with high concentrations of Raloxifene. This result agrees with our predictions generated using the COVID-19 PHARMACOME in which the drug combination with the shortest distance, Raloxifene and Nelfinavir (Supplementary Table 5), would have a synergistic effect on SARS-CoV-2 pathology.

Discussion
By combining a significant number of knowledge graphs which represent various aspects of COVID-19 pathophysiology and drug-target information we were able to generate the COVID-19 PHARMACOME, a unique resource that covers a wide spectrum of cause-and-effect knowledge about SARS-CoV-2 and its interactions with the human host. Based on a systematic review of the results derived from published drug repurposing screening experiments, as well as our own drug repurposing screening results, we were able to identify mechanisms targeted by a variety of compounds showing virus inhibition in phenotypic, cellular assays. With the COVID-19 PHARMACOME, we are now able to link repurposing drugs, their targets and the mechanisms modulated by said drugs within one computable data structure, thereby enabling us to target-in a combinatorial treatment approach-different, independent mechanisms. By challenging the COVID-19 PHARMACOME with gene expression data, we have identified subgraphs that are responsive (at gene expression level) to virus infection. Network analysis along with the overview on previous repurposing experiments provided us with  www.nature.com/scientificreports/ the insights needed to select the optimal repurposing drug candidates for combination therapy. Experimental verification showed that this systematic approach is valid; we were able to identify two drug-target-mechanism combinations that demonstrated synergistic action of the repurposed drugs targeting different mechanisms in combinatorial treatments. We are fully aware of the fact that the COVID-19 PHARMACOME combines experimental results generated in different assay conditions. In the course of our work, we accumulated evidence that assay responses recorded using Vero E6 cells in comparison to Caco-2 cells may only partially overlap. Comparative analysis of the results of both assay systems to virus infection by means of transcriptome-wide gene expression analysis is one of the experiments we plan to perform next. However, for the identification of meaningful combinations of repurposing drugs, the current model-driven information fusion approach was shown to work well despite the putative differences between drug repurposing screening assays.
Given the urgent need for treatments that work in an acute infection situation, our approach described here paves the way for systematic and rational approaches towards combination therapy of SARS-CoV-2 infections. We want to encourage all our colleagues to make use of the COVID-19 PHARMACOME, improve it, and add useful information about pharmacological findings (e.g. from candidate repurposing drug combination screenings). In addition to vaccination and antibody therapy, (combination) treatment with small molecules remains one of the key therapeutic options for combatting COVID-19. The COVID-19 PHARMACOME will therefore be continuously improved and expanded to serve integrative approaches in anti-SARS-CoV-2 drug discovery and development. www.nature.com/scientificreports/

Funding
Open Access funding enabled and organized by Projekt DEAL.