Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing

Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. Various studies exist about the molecular mechanisms of viral infection. However, such information is spread across many publications and it is very time-consuming to integrate, and exploit. We develop CoVex, an interactive online platform for SARS-CoV-2 host interactome exploration and drug (target) identification. CoVex integrates virus-human protein interactions, human protein-protein interactions, and drug-target interactions. It allows visual exploration of the virus-host interactome and implements systems medicine algorithms for network-based prediction of drug candidates. Thus, CoVex is a resource to understand molecular mechanisms of pathogenicity and to prioritize candidate therapeutics. We investigate recent hypotheses on a systems biology level to explore mechanistic virus life cycle drivers, and to extract drug repurposing candidates. CoVex renders COVID-19 drug research systems-medicine-ready by giving the scientific community direct access to network medicine algorithms. It is available at https://exbio.wzw.tum.de/covex/.

with a hub-penalty that can be specified by the user (detailed below). If is set to a 0, ] λ ∈ [ 1 λ value close to 1, the returned Steiner trees avoid hub-nodes. This is important, because targeting proteins which are hub-nodes in the human protein-protein interaction (PPI) network potentially results in many undesired side effects.
The multi-Steiner tree algorithm employed in CoVex is implemented as follows: In a first step, we use the algorithm by Kou et al. to compute the first Steiner tree . Moreover, we run a T depth-first search to find all bridges in the graph, where a bridge is an edge whose deletion results in the graph being disconnected. Let be the list of edges in , be the cost of , L T C T and be a user defined tolerance that specifies by how much the costs of the subsequent trees τ may exceed . Furthermore, let be the number of already discovered trees (initialized to 1) C k and be the set of returned nodes (initialized to the nodes contained in ). We iterate the U T following steps until or is empty. Subsequently, we return the subgraph induced by .
If is a bridge, go to step 1. e 3.
Temporarily delete from . e G 4.
Run the algorithm by Kou et al. to compute the next candidate tree . T ′

5.
If the cost of does not exceed , add the nodes of to and increment . T ′ C · 100 100 + τ Remove all edges from which are not contained in . L T ′ 7.
Reinsert into . e G Weighted TrustRank : TrustRank is a variant of Google's pagerank algorithm, where "trust" is iteratively propagated through the network starting from an initial set of (trusted) seed nodes 3 . At termination, each node in the network receives a score which is high if the node is easily reachable from a subset of seed nodes, which themselves assume central positions in the overall topology of the network.
In CoVex, TrustRank can be used for two purposes. First, TrustRank can be used to rank drugs targeting a previously selected or computed set of host proteins. Second, it can be employed to discover potentially druggable host proteins which are relevant for a given set of viral or host seed proteins. Note that, unlike the result returned by the multi-Steiner algorithm, the top ranked nodes returned by TrustRank are not guaranteed to be connected. In practice, however, many of the returned nodes often form one large connected component. Thus, TrustRank allows to extract pathways in situations where it is not clear a priori that all seed nodes are involved in one mechanism.
In CoVex, we use a customized version of the TrustRank algorithm which allows the user to avoid hubs, if required. This is achieved by defining parameterized edge diameters (the definition of is given below in section "Parameterized edge costs with (e) /c (e) d λ = 1 λ c λ hub-penalty"). If the user sets the hub-penalty to a value close to 1, the diameters of the λ edges which are incident with hub-nodes are small. Therefore, "trust" flows less easily to the hubs, which implies that they obtain a lower score. Further parameters are the result size, which specifies how many top ranked nodes should be displayed in the result, and the damping factor . The damping factor controls how easily "trust" can flow to nodes which are far 0, ] d ∈ [ 1 d away from the seeds. The larger , the higher scores distant nodes receive. d Weighted, seeded closeness centrality: Closeness centrality is a simple centrality measure which ranks the nodes in a network based on the average distance of the shortest paths to all other nodes. Kacprowski et al. suggested a version of this centrality measure where only the distances to a selected set of seed nodes are taken into consideration 4 .
In CoVex, we use a modified version of this approach which uses the parameterized edge costs instead of uniform costs. This customized version of seeded closeness centrality hence c λ assigns lower scores to hubs, if requested by the user. Like TrustRank, closeness centrality can be used both for extracting drug targets and for ranking drugs.
Parameterized edge costs with hub-penalty: For each edge in the network , the u, ) ( v ∈ E G parameterized costs employed by the three algorithms presented above are defined as , where and are the degrees (i.e., number of links) of the nodes and , is the mean degree of all nodes contained in u v vdeg(G) a , and is a hub-penalty, which can be specified by the user. Note that for , all G 0, ] λ ∈ [ 1 λ = 0 edge costs equal (no hub-penalty), while for , the cost of each edge is a direct vdeg(G) a λ = 1 function of the degrees of its incident nodes (maximal hub-penalty). By setting to a value λ between 0 and 1, the user can balance between these two extremes.

Supplementary Figure 1 -The multi-Steiner tree connecting all host proteins interacting
with the virus (blue nodes are seeds) visualized in Cytoscape 5 using the GraphML export feature of CoVex. The purple nodes are the connectors building a tree structured subnetwork of the human interactome.
Why the host protein interactions matter for drug repurposing 247 out of 332 host proteins interacting with the virus (interactors) reported in Gordon et al. 6 for SARS-CoV-2 form a large connected component, indicating that most of the direct targets of the virus are in a close proximity in the human interactome. 12 drugs currently in clinical trials (Dexamethasone, Colchicine, Pravastatin, Ribavirin, Ruxolitinib, Bromhexine, Oseltamivir, Noscapine, Ascorbic acid, Tofacitinib, Artenimol, Suramin) target the virus interactors directly. Supplementary Figure 1 shows a Steiner tree of minimum cost connecting all the interactors. It consists of 44 connector proteins in addition to 332 interactors. The Steiner tree enables us to find new drug target candidates. Importantly, six of the 69 drugs currently in clinical trials (Supplementary Table 5) target exclusively the connector proteins revealed by our analysis, namely Glycyrrhizic acid, Synthetic Conjugated Estrogens B, Leflunomide, Chloroquine, Deferoxamine, and Thalidomide.

Supplementary Notes
Application scenario a -Starting from a selection of viral proteins, we seek to use the human interactome to identify biological mechanisms or pathways utilized by the virus during infection.
As an example, we are interested in the viral proteins E, M and Spike, which constitute the external structure of the virus and thus participate in the entry into host cells 7,8 .
First, we select all host proteins interacting with the viral proteins E, M and Spike from the SARS-CoV-2 dataset. We then use the multi-Steiner Tree algorithm with the parameters shown in Supplementary Table 1 to uncover the biological pathway involved. The resulting network allows the identification of 26 new potential drug targets, including the Bradykinin receptor B1 (BDKRB1).
Next, we use closeness centrality with the parameters shown in Supplementary Table 1 to find drugs affecting this pathway. We identify a total of 30 approved and 10 non-approved drugs (Supplementary Figure 2). Notably, we find 6 relevant drugs that target BDKRB1: Ramipril, Captopril, Perindopril and Enalaprilat (approved), which belong to the Angiotensin Converting Enzyme (ACE) inhibitor class 9 . Icatibant is an antagonist of the Bradykinin receptor B2 10 and bradykinin is a non-approved drug which is degraded by the ACE 11 .
Finally, to understand the relationship between BDKRB1 and Angiotensin Converting Enzyme 2 (ACE2) as well as Transmembrane protease serine 2 (TMPRSS2), two proteins known to be involved during virus entry 12 , we use the "custom proteins" option available in CoVex and utilize the multi-Steiner tree algorithm with the same parameters as in Supplementary Table 1. We find that the Kininogen 1 (KNG1) and Angiotensin (AGT) proteins connect BDKRB1 with ACE2 (Supplementary Figure 3). These 4 proteins are functionally related through the Renin-Angiotensin System, which is targeted by ACE inhibitors ( https://www.wikipathways.org/instance/WP554 ).
In summary, CoVex identifies the protein BDKRB1, which participates in the pathway affected by SARS-CoV-2 and can be targeted by several ACE inhibitors, which are widely used in clinical trials to treat COVID-19. It should be noted that the ACE2 protein is not present in the set of seeds used to start the analysis; however, CoVex is capable of identifying the pathway and new protein targets functionally related to ACE2, which can be targeted by ACE inhibitors as well. In this case, CoVex allows the identification of the mechanism behind the drugs currently considered for treating currently considered for treating COVID-19.

Supplementary Figure 2 -Network obtained from interactors of baits E, M and Spike with
multi-Steiner tree followed by closeness centrality. Blue nodes are protein targets, green nodes are approved drugs and orange nodes are non-approved drugs. Lines represent the interactions between the proteins and drugs. ACE inhibitor drugs are identified, such as Ramipril, Captopril, Perindopril and Enalaprilat targeting the BDKRB1 protein, which are currently being evaluated in clinical trials in COVID-19 patients. Note that we also included this  Application scenario b -Candidate drugs for the treatment of COVID-19 can be identified starting from a user-defined set of seeds comprising differentially expressed genes (DEGs) and viral proteins. Such a custom list can be obtained from other datasets, such as experiments and/or literature. One possible strategy is to use proteins known to be associated with a specific biological process, such as SARS-CoV-2 proteins involved in viral pathogenesis and host proteins that participate in the corresponding host immune response to infection. This way, we use the host PPI network to connect the viral proteins to the DEGs and obtain a potential mechanism that can be targeted using repurposable drugs. To obtain a custom list of DEGs, raw counts from the gene expression data of SARS-CoV-2-infected lung epithelial A549 cells relative to mock-treated cells were obtained from Blanco-Melo et al. 13 (GEO accession GSE147507). Differential expression analysis using the edgeR (v.3.26.8) package (fold change ≥ 2, adjusted p-value <0.05) was performed to obtain a list of DEGs. To identify host cell pathways enriched in response to SARS-CoV-2 infection, KEGG enrichment was performed using the gseapy ( v.0.9.15 ) package. Enriched KEGG terms included pathways that are known to be involved in immune response to pathogens, such as "Influenza A", "Herpes simplex infection", "Measles", and "Hepatitis C".
In this example, we use the CoVex platform to select viral proteins involved in innate immune response and apoptosis, namely, ORF7a and ORF3a, as indicated in Gordon et al. 6 . Next, using the "Custom proteins" option, we upload the Uniprot IDs of the DEGs that participate in the enriched pathway Herpes simplex infection, which is involved in response to infection with Herpes simplex virus (HSV), another viral pathogen. The enriched DEGs include IFIH1 , OAS1 , STAT1 , DDX58 , OAS2 , OAS3 , IRF7 , EIF2AK2 , IFIT1 , and IRF9 . We then use both the viral proteins and DEGs as seeds for the multi-Steiner tree algorithm to extract a subnetwork that is relevant to our pathway of interest, shown in Supplementary Figure 4. Next, we use closeness centrality on the resulting subnetwork to obtain drugs. The parameters used to run the multi-Steiner tree and closeness centrality algorithms can be found in Supplementary Table 2.
Top-ranking drugs included Tofacitinib and Ruxolitinib, which are currently being assessed in clinical trials for the treatment of COVID-19 (Supplementary Figure 5). Tofacitinib and Ruxolitinib are both known to inhibit Janus kinase (Jak), which promote cytokine signaling 14,15 . Thus, administration with these drugs can mitigate immune-mediated lung injury and prevent functional functional deterioration in COVID-19 patients caused by an over-amplified host immune response. As shown in Supplementary Figure 5, other drugs that target this subnetwork include Masitinib, Erlotinib, and Sorafenib, which could be further examined in downstream analyses. In a similar manner, users can provide a custom list of proteins to retrieve drugs that can target their mechanism of interest, followed by careful examination of the results. Application scenario c -Another approach to find candidate drugs to combat COVID-19 is to connect the targets of promising drugs which are already in clinical trials to the viral proteins. We can identify the candidate mechanisms by extracting the sub-network(s) connecting these two ends with a minimum number of intermediate connector proteins. This step can be done by using the multi-Steiner tree algorithm integrated in our CoVex platform. We can then seek for the drugs targeting the identified connector proteins utilizing the closeness centrality algorithm from the "Find Drugs" function.

Supplementary Table 2 -Algorithms and parameters used in application scenario b
As an example, we start with the drugs classified as immunostimulants (Sargramostim and Peginterferon alfa-2a) 16 , and then use the multi-Steiner tree algorithm from the "Find Drug Targets 18 . This gene can be inhibited by the investigational drug KB002 ( DB05194) (Supplementary Figure 7), which is an engineered human monoclonal antibody-based treatment for inflammatory and autoimmune processes 19 . Application scenario d -Some hypotheses might require a hybrid seeding approach, where we are required to start from a hypothesis-driven mixed selection of viral and host proteins as well as drugs to explore protein-protein interactions to identify a mechanism and suggest additional drugs.

Supplementary Table 3 -Algorithms and parameters used in application scenario c
Here, we follow a recently published hypothesis concerning the interference of the SARS-CoV-2 with the formation of hemoglobin in erythrocytes 22,23 . Essentially, we follow the idea that the virus could interfere with porphyrin, which is a substrate, together with iron (Fe 2+ ) ions, during the synthesis of the heme prosthetic group in hemoglobin. The viral proteins thus hinder the interaction of iron with porphyrin, as the viral proteins are hypothesized to compete with iron for the porphyrin and thus to inhibit heme group synthesis, leading to hypoxia symptoms 24 . Liu and Abrahams suggest that this might explain why Chloroquine and Favipiravir are effective drugs, as they may prevent the virus from competing with iron for the porphyrin: Chloroquine by interfering with the viral proteins NSP1-16, ORF3a, and ORF10 which may bind the porphyrin to prevent heme synthesis, and also by inhibiting the binding of viral protein ORF8 to porphyrins "to a certain extent" 24 . Favipiravir could prohibit ORF7a from binding to (free) porphyrin in addition to preventing the virus from entering the host cells 24 .
Starting from this theory, we investigate the host interactome for potential drug repurposing candidates. In CoVex, we selected the viral proteins NSP1-16, ORF3a, ORF7a, ORF8 and ORF10 as seeds and expanded the network to all of their host partner proteins. Notably, we see that NSP7 may bind to Cytochrome b5 reductase (CYB5R), which converts methemoglobin to hemoglobin (oxygen-transporting Fe 2+ hemoglobin to non-oxygen-transporting Fe 3+ hemoglobin). Cytochrome b5 reductase is involved in the transfer of reducing equivalents from the physiological electron donor, NADH, via an FAD domain to the small molecules of cytochrome b5. It is also heavily involved in many oxidation and reduction reactions, such as the reduction of methemoglobin to hemoglobin [25][26][27] . In addition, we see that ORF3a binds to HMOX1 (Heme oxygenase 1), which might lead to interference with heme degradation. We continued with all 238 host interactors as new seeds and executed KeyPathwayMiner (Supplementary Table 4) to investigate the host interactome for proteins that connect the selected virus-host interactions proteins. We discovered five new drug targets (the proteins APP, XPO1, TRIM25, HSCB, FBXO6) for which we extracted 20 drug candidates using the closeness centrality measure and only screening for currently approved drugs. In the resulting ranked list, we rediscovered Chloroquine as well as Deferoxamine, both of which are currently in clinical trial or discussed in the literature as candidate drugs for COVID-19 treatment. Note that Deferoxamine is widely used for the treatment of Thalassemia and as a chelator of ferric ion in disorders of iron overload 28 . In addition to these two drugs, we find Methylene blue, a drug that is approved by the FDA for the treatment of methemoglobinemia. Note that the evidence for a methemoglobinemia caused by SARS-CoV-2 is anecdotal (no reports on abnormal methemoglobin levels, iron metabolism markers, etc. exist) and we used this hypothesis to illustrate a potential hybrid level starting point for the network medicine investigation of a hypothesis using CoVex.