Multiscale statistical physics of the pan-viral interactome unravels the systemic nature of SARS-CoV-2 infections

Protein–protein interaction networks have been used to investigate the influence of SARS-CoV-2 viral proteins on the function of human cells, laying out a deeper understanding of COVID–19 and providing ground for applications, such as drug repurposing. Characterizing molecular (dis)similarities between SARS-CoV-2 and other viral agents allows one to exploit existing information about the alteration of key biological processes due to known viruses for predicting the potential effects of this new virus. Here, we compare the novel coronavirus network against 92 known viruses, from the perspective of statistical physics and computational biology. We show that regulatory spreading patterns, physical features and enriched biological pathways in targeted proteins lead, overall, to meaningful clusters of viruses which, across scales, provide complementary perspectives to better characterize SARS-CoV-2 and its effects on humans. Our results indicate that the virus responsible for COVID–19 exhibits expected similarities, such as to Influenza A and Human Respiratory Syncytial viruses, and unexpected ones with different infection types and from distant viral families, like HIV1 and Human Herpes virus. Taken together, our findings indicate that COVID–19 is a systemic disease with potential effects on the function of multiple organs and human body sub-systems. Characterizing the interactions between viral and human proteins is key to understand the function and structure of viruses such as SARS-CoV-2 and for informing drug design and repurposing strategies. Here, the authors use statistical physics techniques to perform a systematic multiscale comparison of the effects on the human interactome of SARS-CoV-2 with respect to other viruses, and find that COVID-19 exhibits properties typical of systemic diseases.

T he COVID-19 pandemic, with global impact on multiple crucial aspects of human life, is still a public health threat in most areas of the world. Despite the ongoing investigations aiming to find a viable cure, our knowledge of the nature of the disease is still limited, especially regarding the similarities and differences it has with other viral infections. On the one hand, SARS-CoV-2 shows high genetic similarity to SARS-CoV 1 -the virus causing 2003 coronavirus outbreak-and its infection shares a number of symptoms with some other respiratory diseases, such as flu caused by Influenza virus. On the other hand, drugs usually used to treat different infection types, like AIDS caused by Human Immunodeficiency Virus (HIV), are under investigation to treat COVID-19 [2][3][4] , suggesting potentially unexplored parallel between the function of other viruses and SARS-CoV-2. Characterizing these (dis)similarities can result in a deeper understanding of the novel coronavirus and facilitate the search for reliable treatments.
With the rise of network medicine 5-10 , methods developed for complex networks analysis have been widely adopted to efficiently investigate the interdependence among genes, proteins, biological processes, diseases, and drugs 11 . Especially, protein-protein interactions (PPI) 12 play an essential role in every cellular process and, therefore, PPI network analysis has been extensively used to predict protein function and understand signal transduction pathways in normal or altered conditions. The human PPI networks can include direct (physical) and indirect (functional) interactions, identified through a wide range of experimental and computational techniques.
Additionally, since PPIs are potential drug targets, a better understanding of the interactomes is also essential in drug development. In fact, interactomes are characterized by topological modules bridged by a small number of cross-module PPI 13 , organized into modular hierarchies 14 essential for efficient information exchange 15,16 and, consequently, for the system function.
Moreover, PPI network analysis has been used for characterizing the interactions between viral and human proteins in case of SARS-CoV-2 [17][18][19] , providing insights into the structure and function of the virus 20 and identifying, for instance, drug repurposing strategies [21][22][23][24] . Very recently, the molecular analysis unraveled the potential reason behind the fact that SARS-CoV-2 infections lead to diverse outcomes for COVID-19, the disease being more severe and lethal preferentially for males and for older patients rather than children and young adults [25][26][27] .
A comprehensive comparison of SARS-CoV-2 against other viruses has the potential to unravel hidden (dis)similarities with the effects of existing and well-known viral agents, opening the opportunity to network-based applications which complement the more standard ones. However, such a systematic analysis is still missing or limited to a few viruses biologically similar to SARS-CoV-2: recently, the comparative analysis against other zoonotic coronaviruses causing Severe Acute Respiratory Syndrome (SARS) in 2002 and Middle East Respiratory Syndrome in 2012, revealed the existing of pan-viral disease mechanisms 28 .
Here, we use statistical physics and techniques from computational biology to analyze pan-viral patterns of 93 viruses, including SARS-CoV-2. We consider the virus-human PPI as an interdependent system with two parts, human PPI network targeted by viral proteins. We carry out a multiscale analysis of virus-host interactomes to highlight how viral interactions impact and perturb the PPI network. In Fig. 1 we illustrate, schematically, the multiscale nature of this work, and the features we extract from the interactomes. To discover pan-viral patterns, we feed advanced machine-learning techniques with the output of physics and biology analyses in order to cluster together viruses with similar physical, biological, or biophysical features. Our findings indicate that SARS-CoV-2 groups with a distinct number of pathogens depending on the physical scale and on the biological information used, providing complementary perspective on its functional effects on organs and human sub-systems. For instance, we find proximity with pathogens such as Human Respiratory Syncytial virus while being very close to other clusters including HIV1 and Herpesvirus, suggesting that COVID-19 exhibits properties typical of systemic diseases. The results of these analyses confirmed the peculiar similarity found between SARS-CoV-2 and viruses from distant families. By integrating all the results obtained from each analysis, we reached a final clustering for viruses which accounts, simultaneously, for biological and physical features from micro to macro scales. Our finding shed light on the unexplored aspects of SARS-CoV-2 from the perspective of statistical physics of complex networks. The presented framework opens the doors for further theoretical developments aiming to characterize structure and dynamics of virus-host interactions, as well as grounds for further experimental investigation and potentially novel clinical treatments, since one can exploit knowledge about existing drug-target interactions related to known viral agents to perform networkbased prediction of drug candidates for SARS-CoV-2 from viruses exhibiting similar properties from a statistical physics and biological point of view, thus complementing existing and more biologically only approaches.

Results
Here, we use data regarding the viral proteins and their interactions with human proteins for 93 viruses (see "Methods"). To obtain the virus-human interactomes, we link the data to the BIOSTR Human PPI network (19,945 nodes and 737,668 edges) 29,30 built from data fusion of two comprehensive public repositories (see "Methods" and Fig. 2). We also refer to Supplementary Note 1 and Supplementary Figs. 1-4, for summarizing statistics about viruses size, targeted human proteins, and viral families.
Mapping biology into mathematical models. To allow the analysis from the perspective of statistical physics of complex networks, we first need to map the biology of our problem into mathematical assumptions that can be used operationally. On the one hand, viral proteins try to coopt cellular processes, from protein translation to nuclear transport, through a complex web of PPI. On the other hand, the response of human cells consists in initiating transcriptional programs which activate the adaptive immune system innate and anti-viral countermeasures to control and mitigate virus' replication. However, DNA and RNA viruses behave differently: the first ones target proteins to alter either human cellular processes or metabolic processes-or both simultaneously-while the second ones tend to target proteins involved into RNA processing, intracellular transport and localization within the cell, preferentially 31 .
It is worth remarking that our hypotheses in this work do not correspond to a difference between DNA and RNA viruses but, instead, they are intended to provide an operational framework to support the choice of the analytical techniques used in this study. Here, we will consider the following mapping, regardless of the type of virus, whether RNA or DNA, to allow for a consistent comparison of results across all families of viruses considered in this work: (1) Type-I: the interaction between a viral protein and a human target is assumed to inhibit the function of the latter, destroying its existing interactions with the human interactome. This approach induces a specific change in the function of sub-system the target belongs to and, potentially, in the function of the whole interactome. (2) Type-II: the interaction between a viral protein and a human target is assumed to perturb the function of the latter, propagating such a perturbation systemically according to some specific biomolecular dynamics.
Note that more sophisticated approaches are also possible: for instance, one can randomly rewire a fraction-or the whole setof the interactions involving the target protein, thus preserving the overall network connectivity while only altering the functionality of the system. While the Type-I approach inhibits a target, the Type-II also encodes the activation of novel interactions: however, in this second case, the results might depend on the way rewiring is performed-e.g., within or across functional modules-and, to avoid the dependence of our results from the methodology used for rewiring, we prefer to keep the lowest possible number of assumptions and degrees of freedom to employ only Type-I and Type-II approaches.
Percolation of the interactomes and perturbation propagation: microscopic analyses. In this section we introduce two analyses performed on virus-host interactomes at the microscopic scale to detect virus (dis)similarities. A complete discussion of methods and results is presented in the Supplementary Notes 5 and 6. On Here, the underlying biological hypothesis is that viral proteins might inhibit the usual function of human targets, and we map this activity into the removal of protein from the system. We also test another less invasive hypothesis: the viral proteins interact with the human targets while altering, and not just inhibiting, their functions: the resulting perturbations are propagated (dashed lines mimicking the propagation) and we analyze the system response 36,79 . Meso: in this case, the underlying hypothesis is that viral proteins alter the function of the human interactome at the mesoscopic level, i.e., interfering with the functional organization in modules (green shaded areas) typical of biomolecular systems 13,14,80 . This interference is mapped into the isolation of the target proteins, and the modular and hierarchical re-organization of the interactome is detected according to two popular methods for community and hierarchy detection 39,40 . Macro: viral interactions δG perturb macroscopic properties of the interactome which are captured by the analysis of the network density matrix 46,47 von Neumann entropy, Massieu function (ϕ(β, G)) and energy functions at temporal scale β.
the one hand, we investigate percolation processes, that, in the past, have been proved useful to shed light on several aspects of protein-related networks, such as in the identification of functional clusters 32 and protein complexes 33 , the verification of the quality of functional annotations 34 or identification of critical properties 35 . These successful applications motivate us to investigate percolation properties of virus-host interactomes. However, it turns out that percolation does not offer valuable insights when it comes at differentiating the topological response of our set of viruses under protein removals (see Supplementary Fig. 11), because the interactomes are too similar between each other. On the other hand, we take a dynamical approach and consider a regulatory dynamic process evolving on top of the reconstructed interactome with the aim of assessing differences between viral agents in the way they impact this system, by means of a dynamic perturbation in its steady state 36,37 . We employ recent definitions of correlation functions 36,38 to quantify the system response. We find that while this approach returns interesting insight regarding the amount of perturbation distributed by single targeted proteins (see Supplementary Fig. 12), there is need for more analyses to bring a comprehensive picture. Therefore, these types of microscopic analyses do not allow us to achieve our goal and we devote the rest of the article to investigate alternative approaches to differentiate between these so topologically similar interactomes.
Functional organization in modules and hierarchy: mesoscale analysis. In this section we analyze how the modular and hierarchical organization of the human interactome changes in response to perturbations caused by viral agents, to shed light on the impact on the functional organization of human proteins and their interactions. Here, the underlying assumption is that the viral proteins alter the functional role of their targets in such a way that they impact on the overall function of the system: operatively, this alteration is mapped into the isolation of protein targets from the network. This method alters the modular structure and the hierarchical organization, leading to a change in the number of functional modules and the hierarchical structure of protein groups. We quantify this change by measuring the number of modules obtained through multiscale modularity maximization based on the Louvain method 39 and through the Bayesian inference of a hierarchical degree-corrected stochastic block model (DCSBM) 40 . The hierarchical structure is probed by extending iteratively the analysis on the network of community nodes, where each module is treated as a supernode of a higher level network. These properties are measured for both the untargeted human PPI and the targeted virus-human PPI network, the relative change being quantified in the number of modules and in the modularity, captured by ΔModules and ΔModularity, respectively (see Fig. 3). The Louvain method suggests that viral interactions tend to increase the number of modules, decrease the modularity and reduce the number of levels in the hierarchy, indicating a decentralization of functions and a large-scale change in how information is exchanged, respectively. According to our resutls, SARS-CoV-2 exhibits a non-negligible positive change in modularity, like HPV type 16, Influenza A, and Bunyavirus. When analyzed from the perspective of Bayesian inference, we find a larger number of modules on average with respect to Louvain and an opposite trend: after viral interactions, modularity increases in most of the cases. Overall, a few viruses do not alter the hierarchical organization of the human interactome, the trend being a reduction in the number of levels, indicating that information exchange across units might be less efficient 15,16 . We also compare the new partitioning of functional modules of the targeted interactomes to the un-targeted groups of proteins, via normalized mutual information 41   Analysis of macroscopic properties: spectral information. In this section, we use statistical physics of complex networks to analyze the macroscopic features of virus-human PPI networks. A variety of methods have been introduced to analyze the information content of complex networks 42,43 . Since networks can be viewed as collections of entangled entities, a density matrix can be used to describe their state as in quantum statistical mechanics. While some choices of the density matrix have been shown to be unphysical 44,45 , Gibbsian-like density matrices have been successfully used to define spectral entropy 46,47 and estimate the information content of empirical complex networks at multiple scales, with applications ranging from transportation systems 48 to the human microbiome 46 and the human brain 49 . In fact, it has been shown that such density matrices describe the short to long range interactions between the nodes, and their Von Neumann entropy encodes the diversity of information dynamics within the structures 16 . The goal of this section is to study and compare the effect of viral components on the state of information dynamics in the human protein-protein network. The density matrix can be defined in terms of the combinatorial Laplacian matrix L = D − A, where D is defined as D ij = k i δ ij , where δ ij = 1 if i = j and otherwise δ ij = 0, and k i ¼ ∑ j A ij denotes the degree of ith node. The Laplacian matrix governs the diffusion dynamics on top of the network and is involved in the linear stability analysis of many complex dynamics, such as synchronization. Here we use the Gibbs state given by: which is defined in terms of the propagator of a diffusion process on top of the network, where β encodes the temporal scale for signal propagation, normalized by the partition function Zðβ; GÞ ¼ Tr e ÀβL À Á , which has an elegant physical meaning in terms of dynamical trapping for diffusive flows 48 . Consequently, the counterpart of Massieu function-also known as free entropy -in statistical physics can be defined for networks as: Note that a low value of the Massieu function indicates high information flow between the nodes. The von Neumann entropy can be directly derived from the Massieu function by: encoding the information content of graph G. In the following, we use the above quantities to compare the interactomes corresponding to different virus-host interactomes. In fact, as the number of viral nodes is much smaller than the number of human proteins, we model each virus-human interdependent system G 0 as a perturbation of the large human PPI network G (see Fig. 4). After considering the viral perturbations due to each virus, the von Neumann entropy and Massieu function of the human PPI network change slightly, as follows: • δSðβ; G 0 Þ ¼ Sðβ; G 0 Þ À Sðβ; GÞ • δϕðβ; G 0 Þ ¼ ϕðβ; G 0 Þ À ϕðβ; GÞ In our analysis of the perturbations, the temporal scale β is used as a resolution parameter tuned to characterize the node-node interactions, from short to long range 16 . Based on the magnitude of perturbations, caused by the viral components, and using k-means algorithm, a widely adopted clustering technique, we group the viruses together (see Fig. 4)-i.e., the perturbations in Von Neumann entropy and Massieu function shape our two-dimensional feature space and the number of clusters has been calculated using the elbow method at each temporal scale β = 1, 3, 5. A more advanced clustering and the full description of the cluster members at different characteristic propagation time scales is presented later in the text.
Gene ontology and pathways enrichment analysis. To understand if these findings were biologically relevant, we have further performed a clustering analysis on the 93 viruses based on the human proteins they interact with (Supplementary Data 5). We consider a human protein as a "shared target" if it was reported to bind both to a SARS-CoV-2 protein and another virus' protein, according to the PPI data retrieved from http://viruses.string-db.org (Supplementary Table 1). Out of the 332 human proteins directly targeted by SARS-CoV-2, only 18 of them were found to be also targeted by other viruses, among which Herpes viruses, HPV type 16, Reovirus or Encephalomyocarditis virus (Supplementary Table 1). Figure 5a shows that SARS-CoV-2 does not indeed cluster with any other virus on the basis of shared protein interactors ( Supplementary Fig. 2). We then extended our clustering analysis to biological pathways and processes in which these targeted proteins are involved. The R package clusterProfiler 50 allows to perform enrichment analysis of gene clusters and was used to identify statistically enriched Reactome pathways 51 and Gene Ontology terms 52 potentially targeted by the viruses although through multiple different proteins. Considering enriched Reactome pathways, SARS-CoV-2 was shown to have the highest similarity with Bunyavirus and Reovirus (Fig. 5b). The same clustering analysis on Biological Processes as defined by the Gene Ontology database showed that SARS-CoV-2 clusters with Rotavirus C, another virus of the Reoviridae family ( Supplementary Fig. 5).
These two methods to assess virus similarities (based either on their targeted proteins, or on their relative enriched pathways among these proteins) are complementary. Although Bunyavirus does not share any human protein target with SARS-CoV-2 (Supplementary Table 1), it is still found to be the most similar to SARS-CoV-2 based on their shared targeted biological pathways (which are mostly related to mitotic checkpoint controls, see Supplementary Table 2). To investigate whether SARS-CoV-2 would cluster with other viruses at a higher distance, we extended the clustering analysis to the human proteins located one node further of the proteins directly targeted by viruses (referring to them as second-order interactors, Supplementary Data 6). Figure 6a shows that based on the similarity of these second-order interactors, SARS-CoV-2 clusters with more viruses including Hepatitis B and C, HIV-1, Influenza A, Herpesvirus 1/2/8, Varicella, Cytomegalovirus, HPV16, Epstein-Barr and Bunyavirus. Based on enriched pathways from first-order and second-order targets, SARS-CoV-2 clusters with viruses of Bluetongue, West Nile, Cucumber mosaic, Bunyavirus, Reovirus, Rotavirus C, Newcastle disease, Vesicular stomatitis Indiana, Measles, and Myxoma ( Supplementary Fig. 6). Gene Ontology Biological Processes-based clustering using firstand second-order targets shows an association of SARS-CoV-2 with more viruses as well, including Human SARS coronavirus, Bunyavirus, HPV16/18, HIV-1/2, African swine fever, Simian virus 40, Avian infectious bronchitis, Influenza A, Herpesvirus 1/ 2/8, Hepatitis B/C, cytomegalovirus, and Epstein-Barr virus (Fig. 6b). These latter clusters based on enrichments including second-order viral interactors highlight non-trivial functional similarity between viruses of different families, possibly retrieved with the statistical physics approaches mentioned previously, and in agreement with the results described in the last section. Full investigation of these (dis)similarities require further experimental investigations and is beyond the scope of this work.

Clusters of viruses.
In previous sections, we analyzed the effect of viruses on the human interactome, across different scales. Each analysis, coupled with embedding techniques and clustering algorithms, can be used to investigate the (dis)similarities of viruses from a specific point of view. Here, we use the UMAP dimensionality reduction technique-a machine-learning technique exploiting the hidden geometry of the data-and HDBSCAN-a hierarchical method exploiting spatial density and accounting for the presence of noise-clustering algorithm to groups together the viruses according to their biological and physical effects. We combine the result of different analyses as features to perform the UMAP embedding, to provide an integrated view of virus clusters, identified via HDBSCAN algorithm (for more information and a detailed list of features used, see Supplementary Note 4 and Supplementary Figs. 8-10). In this section, we present the clustering according to three analyses, one based on physical methods, another based on biological and the last one based on their combination (see Fig. 7).
More specifically, when the mesoscale organization is combined with the results obtained from the spectral entropy and Massieu function (β = 3), SARS-CoV-2 is clustered with Influenza A (Puerto Rico), Human Herpesvirus, Human Parovirus B19, and Mrine Minute virus. Instead, combining GO and Pathways enrichment analyses for second-range interactions, the novel coronavirus exhibits more similarity to Influenza A (Puerto Rico), HIV-1, Epstein-Barr virus, and Vaccina virus. Finally, combining all the mentioned features with microscopic analysis of perturbation propagation and the analysis of second interactors comparison, we find Human Herpesvirus, Epstein-Barr virus, Varicella Zoster virus, Hepatitis C virus in the same cluster with SARS-CoV-2. In the discussion, we report on the clustering results according to each analysis and, also, elaborate on the similarities between the results obtained from physical and biological approaches and their integration.

Discussion
Our knowledge of COVID-19 is still far from being complete. To enhance our understanding of properties of the virus responsible for this emerging disease, one possibility is to compare, at a molecular level, the effects of its interactions with the human interactome against the effects of well-known viral agents. By measuring such effects from multiple analysis, one can use the results to cluster together viruses in order to learn about potential hidden pan-viral relationships. However, comparing COVID-19 against other viral infections is still a challenge, since various approaches can be adopted to characterize and categorize the complex nature of viruses and their impact on human cells.
In this study, we used an approach based on statistical physics to analyze virus-human PPI outlining 93 different viral infections. Our findings suggest that microscopic analyses such as percolation and perturbation propagation are not sensitive to the differentiating features of networks, due to the similarity of interactomes and the high level of details which is a characteristic of microscale analysis (see "Methods").
Thus, we investigated the effect of virual components on the mesoscale organization of human protein-protein interactome. We used the UMAP dimensionality reduction technique with the HDBSCAN clustering algorithm to find the viruses exhibiting the highest similarity to SARS-CoV-2 in the way they affect the functional modularity, including Influenza A (Puerto Rico) and Marine Minute virus. While this analysis provides mesoscopic details about the impact of viruses on the human proteins, it is not sufficient to identify and compare the global effects of viruses. Therefore, to complement the mesoscale analysis, we used thermodynamic-like quantities-such as the von Neumann entropy and the Massieu function-to quantify the effect of viruses on human interactome, across multiple scales determined by the resolution parameter β. We used the HDBSCAN algorithm again and find SARS-CoV-2 showing similarity to Human Respiratory Syncytial virus at small scales, while at larger scales where the interplay between the topology of virus-host interaction and information flow dynamics becomes more relevant, Measles virus is found in their cluster. It is also worth pointing out that in the geometric space determined by UMAP, the cluster containing SARS-CoV-2 is very close to other clusters including viruses such as SARS-CoV, Human Herpesvirus, and HIV-1, suggesting that SARS-CoV-2 exhibits physical and biological features which makes it similar to viruses well known for their systemic effects, rather than for localized ones. Our findings suggest unexplored relationships between SARS-CoV-2, Herpesvirus, and HIV-1, motivating further theoretical and experimental investigations.
Furthermore, our biological pathways enrichment analysis highlighted that SARS-CoV-2 might impact specific pathways also targeted by other viruses, from different families, although their human protein targets were found to be different, in the strict sense.
In fact, we included the biological analysis based on enrichment with gene ontology and biological pathways, considering only first direct interactors, and then second-order interactors. Concerning the direct interactors, although the approach solely based on protein similarity did not allow to highlight any relevant cluster, 18 human proteins were found to be targeted both by SARS-CoV-2 and by other viruses. Surprisingly, no other members from the coronavirus family were found to share human targets with SARS-CoV-2. However, when using pathway enrichment analysis, we observed that SARS-CoV-2 clustered with Bunyavirus (La Crosse encephalitis) and Reovirus. It is worth noting that La Crosse encephalitis virus can cause inflammation of the brain and its symptoms include nausea, headache, vomiting (in milder cases) and seizures, coma, paralysis, and permanent brain damage (in severe cases) 53,54 . Additionally, ribavirin has been shown to be effective against La Crosse encephalitis virus both in vitro and in infected patients 55,56 . Several clinical trials using the same drug to treat COVID-19 are also ongoing [57][58][59] . Reoviruses can affect the gastrointestinal system (such as rotaviruses) and the respiratory tract. Although they are mostly non-pathogenic in humans, a strain of bat origin has been found to be associated with an acute respiratory disease in humans 60 . When the second-range targets were included in the clustering analysis, SARS-CoV-2 was observed to share secondary targets, and thus clustered with a wider range of viruses, including viruses responsible for skin and eye infections (Varicella, Cytomegalovirus), or attacking the hepatic (Hepatitis B/C), immune (HIV-1/2), respiratory (Influenza, Epstein-Barr), neurological system (Bunyavirus), or more systemic-infectious viruses (Herpes). This apparent similarity with such diverse viruses may help explain the wide variety of symptoms and organs involved with SARS-CoV-2 infection and COVID-19.
We reach similar conclusions based on both physical and biological approaches, providing evidence for the systemic effect of the novel coronavirus. Noticeably, even when all the considered approaches are combined to reach an integrated view of the virus clusters, we observe the same similarity between SARS-CoV-2 and viruses such as Herpes.
It is worth mentioning that the SARS-CoV-2 outbreak is very recent and its PPI is not yet available on the STRING repository.
Therefore, for this particular virus, we relied on a study published in Nature 17 , in April. We acknowledge the possibility that our results might be affected by the limitations of the currently available data sets.
Overall, our framework opens the doors for further analyses of viral agents from the perspective of combining statistical physics and computational biology, highlighting the sensitivity of macroscopic functions, such as spectral entropy, to small variations across interaction networks and, more specifically, virus-host interactomes. Even though other analyses, such as the perturbation propagation patterns, lack the same sensitivity, according to our results it provides microscopic details about the interactions between viral and human proteins that complement the macroscopic view, together enhancing our understanding of the novel SARS-CoV-2 from a new perspective, which can provide a mathematical ground for the exploration of further clinical treatments and biological understanding.
The most likely application in this direction is drug repurposing, i.e., the identification of new roles of an existing drug to discover previously unknown therapies for untreated diseases. Usually, drugs are combined together to trigger their most direct effects, i.e., at the first-order neighborhood of their targets: however, this approach does not account for potential interference at a systemic level, and databases of empirically discovered side effects have to be taken maintained to be interrogated 61,62 . Conversely, network-based drug repurposing has the potential to capture those systemic effects, reducing side effects 63 , an application already being explored for SARS-CoV-2 , and microscale analyses (including cumulative perturbation, see Supplementary Note 6) and biological analyses (including Gene Ontology GO2, Pathways enrichment PW2, and protein interactors INT2 for firstand second-order shared interactors). To map the multidimensional feature space into a 2d space, we use the UMAP dimensionality reduction technique and find the clusters by means of the HDBSCAN algorithm. In all panels, viruses are shown as dots where their colors indicates their membership in clusters and their size is proportional to the reliability of their assignment to that cluster. In each panel, the labels are added to the viruses that cluster with SARS-CoV-2 and located at the intersection of the dashed lines. a Features from mesoscale organization coupled with spectral entropy and Massieu function perturbations. b Embedding using gene ontology and pathways enrichment analyses. c Features from micro-, meso-, and macroscale analyses are combined with biological analyses GO2, PW2, and INT2. combining biological information with AI-based techniques [21][22][23][24] . Our findings complement the ongoing efforts, providing information on similarities between SARS-CoV-2 and other viruses that can be exploited as an additional layer of information for network-based drug repositioning.
Finally, we would like to comment on a more speculative, but extremely fascinating, connection between our findings and latest evidence on the impact of COVID-19 on immune response. On the one hand, in the recent years the study of the human virome 64 -a part of the microbiome-enhanced our knowledge of its relationships with systemic inflammation, immunophenotype, and disease susceptibility, to mention a few. Usually, the human immune system monitors and co-exists with the virome: however, deviations from this equilibrium condition happening, for instance, when immunity is hampered because of a pathogen like SARS-CoV-2, can lead to the proliferation of other viruses which are successfully suppressed in normal circumstances. This perturbation of the immune system state might lead, as a consequence, to bacterial and viral co-infections, as confirmed by meta-analysis of host pathways in SARS-CoV-2 and its potential copathogens 65 . It is tempting to consider viruses clustered with SARS-CoV-2 as natural candidates for such co-infections. Intriguingly, the recent literature on this topic is in agreement with this possibility, for instance in the case of the Influenza A 66 , Epstein-Barr 67 , HIV 68 as well as other respiratory 69,70 viruses, such as respiratory syncytial virus and adenovirus.
On the other hand, it is known that some viruses are able to module the development of autoimmune diseases 71 through distinct mechanisms, such as molecular mimicry and bystander activation 72 . SARS-CoV-2 might be in this class of viruses (see 73 and refs. therein) and the recent finding for a pathological role for exoproteome-directed autoantibodies in COVID-19 74 .
Taken together, such an experimental evidence calls for further analysis to gain deeper insights about the physical and biological features of SARS-CoV-2.

Methods
Overview of the dataset. The human interactome used in this study combines PPI from two of the largest repository publicly available to date, namely STRING v10.5 12 -publicly available at https://string-db.org/cgi/download.pl-and BIO-GRID v3.5.182 75,76 -publicly available at https://downloads.thebiogrid.org/ BioGRID/Release-Archive/BIOGRID-3.5.182/). For a consistent analysis, all protein names and aliases have been standardized to follow the common nomenclature of official symbols of NCBI gene database (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ GENE_INFO/Mammalia/ (accessed: 28/03/2020) 77 ). In the following we will refer to this comprehensive network, in standardized format, as BIOSTR.
The virus-host interactions for 93 viruses are collected from the STRING database-publicly available at http://viruses.string-db.org/. We consider interactions of any type as long as their confidence (score) is equal or larger than 0.7. For each virus, we record the targeted human proteins and build a virus-host interactome by merging this information with BIOSTR. While BIOSTR contains 19,945 proteins, the number of human proteins in each human-virus interactome is 19,929, as we excluded the disconnected components. Also, our analyses are focused only on the human interactome and virus-human interactions, discarding the virus-virus interactions.
It is worth noting that to build the COVID-19 virus-host interactions, a different procedure had to be used. In fact, since the SARS-CoV-2 is too novel we could not find its PPI in the STRING repository and we have considered, instead, the targets experimentally observed in Gordon et al. 17 , consisting of 332 human proteins. The remainder of the procedure used to build the virus-host PPI is the same as before. Figure 2 shows a visualization of the human interactome where proteins targeted by viruses are highlighted. It is worth noting that viruses target a certain number of proteins which have interesting functions in the interactome. In fact, based on our dataset, TP53 (Tumor Protein p53, NCBI Gene ID: 7157) is the most targeted node: it is responsible for inducing changes in metabolism, DNA repair, apoptosis and cell cycle arrest, and its mutations are associated with several human cancers. Other relevant targets (see Fig. 2) include GK (Glycerol Kinase, NCBI Gene ID: 2710), an important enzyme contributing to regulate metabolism and glycerol uptake, and its mutations are associated with glycerol kinase deficiency; TBP (TATA-box Binding Protein, NCBI Gene ID: 6908), which composes the transcription factor IID, which coordinates the activities of more than 70 polypeptides to initiate the transcription by RNA polymerase II; TLR4 (Toll Like Receptor 4, NCBI Gene ID: 7099), relevant for recognizing pathogens and activating innate immunity; STAT2 (Signal Transducer and Activator of Transcription 2, NCBI Gene ID: 6773), acting as a transcription activator within the cell nucleus: it is likely that it contributes to block interferon-alpha response by adenovirus; PTGS2 (Prostaglandin-endoperoxide Synthase 2, NCBI Gene ID: 5743), a key enzyme involved in the process of prostaglandin biosynthesis; IFIH1 (Interferon Induced with Helicase C domain 1, NCBI Gene ID: 64135), encoding MDA5, an intracellular sensor of viral RNA responsible for triggering the innate immune response: it is fundamental for activating the process of pro-inflammatory response that includes interferons, for this reason it is targeted by several virus families which are able to hinder the innate immune response by evading its specific interferon response.
Gene ontology, reactome pathway, and clustering analysis. The compar-eCluster function in clusterProfiler R package was used to perform the Reactome pathway enrichment analysis on viral target proteins with a p value cutoff of 0.005. The parameters "enrichPathway" and "enrichGO" with ontology "BP" were used to retrieve enriched Reactome pathways and biological processes from Gene Ontology, respectively. They are based on hypergeometric distribution to calculate enrichment test for GO terms and Reactome pathways, determining whether some protein sets within the same Reactome pathway or defined by particular GO terms are more represented than expected randomly. Enrichment analysis output results were binarized and clustering was performed using pheatmap R package with binary distance and complete method.

Data availability
Data available in figshare repository 78 .