Hunting for inﬂuential nodes in complex networks using local information

Complex networks in the real world are often with heterogeneous degree distributions. The structure and function of nodes can vary signiﬁcantly, with inﬂuential nodes playing a crucial role in information spread and other spreading phenomena. Identifying high-degree nodes enables change to the network’s structure and function. Previous work either redeﬁnes metrics used to measure the nodes’ importance or focus on developing algorithms to efﬁciently ﬁnd inﬂuential nodes. These approaches typically rely on global knowledge of the network and assume that the structure of the network does not change over time, both of which are difﬁcult to achieve in the real world. In this paper, we propose a decentralized strategy that can ﬁnd inﬂuential nodes without global knowledge of the network. Our Joint Nomination (JN) strategy selects a random set of nodes along with a set of nodes connected to those nodes, and together they nominate the inﬂuential node set. Experiments are conducted on 12 network datasets, including both synthetic and real-world networks, both undirected and directed networks. Results show that average degree of the identiﬁed node set is about 3–8 times higher than that of the full node set, and the degree distribution skews toward higher-degree nodes. Removal of inﬂuential nodes increase the average shortest path length by 20–70% over the original network, or about 8–15% longer than the other decentralized strategies. Immunization based on JN is more efﬁcient than other strategies, consuming around 12–40% less immunization resources to raise the epidemic threshold to τ ∼ 0 . 1 . Susceptible-Infected-Recovered (SIR) simulations on networks with 30% inﬂuential nodes removed using JN delays the arrival time of infection peak signiﬁcantly and reduce the total infection scale to 15%.


Introduction
In the real world, we are surrounded by complex systems, such as social cooperation systems 1 , communication infrastructures 2 , and transportation systems 3 . Complex networks can represent these systems and help researchers to understand them deeply 4 . Spreading processes in networks are crucial in studies of complex networks, since they can be used in many aspects of life, such as stopping contagious disease 5 and improving information diffusion 6 . Influential nodes have a profound effect on network spreading phenomena in complex networks because the integrity and stability of complex networks hinge on influential nodes [7][8][9] . Identifying these critical nodes is crucial in controlling the spreading process for information, behaviors, pathogens, and more [10][11][12][13][14][15][16][17][18] . Influential nodes accelerate the spreading rate and increase the final diffusion scale, so removing or vaccinating the influential nodes will impede the spread of, for example, contagious disease. In the financial transaction network, monitoring the influential nodes' abnormal behavior can help the financial institution detect frauds efficiently and effectively.
In recent years, many researchers have studied how to identify the influential nodes in complex networks. Some of them propose new centrality measures and provide more metrics for measuring node importance, such as PageRank 19 , VoteRank 10 , LocalRank 20 , ClusterRank 21 , Coreness 16 , LeaderRank 22 , and TwitterRank 23 . These proposed centralities consider more information than the traditional ones such as degree, closeness 24 , betweenness 25 centralities, but require more computational time. Others develop new algorithms to search for the prominent nodes in the network. SPIN approach 26 runs faster than greedy algorithms with little quality loss, but is still not efficient enough to be applied to large scale networks 10 . Some fast heuristic algorithms are proposed to cope with this problem. Chen et al. develop a degree discount algorithm 27 , which is more than one million times faster than a typical greedy algorithm 28 , while holding a similar accuracy performance to the greedy algorithm. For the networks with community structures, He et al. propose a community-based method 29 to search for vital nodes from different communities with the community detection algorithms 30 . Morone and Makse map the influential nodes identification problem onto optimal percolation in random networks to find the minimal set of critical nodes 31 . However, most of the previous algorithms are centralized and need full knowledge of the network structure, which is difficult to acquire The nominator or co-nominator nominates a nominee The nominator nominates a co-nominator v u e Figure 1. Illustration of three decentralized influential node identification strategies in real-world scenarios. It should be noted that it is challenging to store and analyze a vast network containing millions of nodes and edges with traditional centralized algorithms because of storage space and time limitations. Moreover, the previous strategies mainly focus on identifying influential nodes in undirected networks, while only few of them are able to work on directed networks, a limitation that cannot be ignored in practice.
To cope with these challenges, we propose a localized and decentralized approach to hunt for influential nodes without the network's global information, called the Joint Nomination (JN) strategy. This strategy randomly selects a fraction of nodes in the network as nominators and lets each nominator nominate one of its neighbors, called co-nominator, and then selects a node from the common neighbours of the nominator and co-nominator. The nodes selected from common neighbours are the final identified influential nodes. This decentralized approach does not need to access the whole network, and its computation time is independent of the network scale, only depending on the number of influential nodes required to be found and on the network density. The proposed JN strategy is also a general approach to finding influential nodes in both undirected and directed networks.
Our approach has many applications. For example, finding the sink hubs (high in-degree) and source hubs (high out-degree) in a transaction network is important in detecting financial frauds and money laundering for financial institutions, and identifying and immunizing a set of influential nodes in a contact network can help to impede the spread of contagious disease, such as COVID-19. The performance of our strategy is tested based on these two application scenarios. The transaction network is a directed network. The average degree of the identified node sets are measured to evaluate the approach's ability to find hubs. The contact network where disease spread is undirected, and we evaluate the pathogen spreading features on the remaining networks after removal of influential nodes found by our proposed strategy.

Strategies
First, we present our Joint Nomination (JN) strategy, and the baseline strategies that we compare against, specifically the Site Percolation (SP) and Friend Nomination (FN) strategies.

Site Percolation (SP) strategy
The SP strategy randomly selects nodes from the network. This strategy does not require global knowledge of the network in the selection of any individual node. It is trivial to implement, but does not offer any advantage in identifying influential nodes. The SP strategy is included to show the minimum performance threshold of any other strategy. Figure 1(a) shows the process of the SP strategy, which chooses a fraction of nodes from a network randomly. In this example, nodes 1, 11 and 19 are chosen by the strategy.

Friend Nomination (FN) strategy
The Friendship Paradox (FP) 32 states that a person's friends are more popular than himself or herself on average. In other words, the average degree of a node's neighbors is greater than that of itself. The Friend Nomination (FN) strategy is based on the FP and is an efficient way to find nodes with relatively high degrees. It is used in acquaintance immunization 33 .
The FN strategy is localized and easily implemented in real cases. First, it randomly picks a fraction, f , of nodes in the network, which are called nominators. Second, each node in the nominator set randomly nominates a node from its neighbours, generating a new set of nodes, nominees. The set of nominees are the chosen influential set of nodes, and have the same size as the set of nominators.
The probability of the nominator with degree j connecting with the nominee with degree k is k j/(2E − 1). Assuming the nominator is connected with the nominee, then the probability that this nominee is nominated is 1/ j. For a specific node with degree k, if any node except itself in the network is selected in the first step, there is an opportunity that this node with degree k can be nominated. The probability that any node is picked uniformly at random from the network is 1/N, then This shows that the probability of a node becoming nominated with the FN strategy is determined by its degree, k. This explains why the FN strategy will preferentially select higher-degree nodes of the network. Figure 1(b) illustrates the FN strategy. FN uses the randomly selected nodes 1, 11, and 19 as the nominator set. Each of these nodes nominates one of its neighbors randomly as nominees, choosing nodes 2, 15 and 20. These nominees are selected as the influential nodes in the network.

Joint Nomination (JN) strategy
Inspired by FN, we develop a new decentralized influential node identification strategy: the Joint Nomination (JN). We illustrate the operating principle of the JN strategy on undirected networks. It can be applied to directed networks with little modification. First, we randomly select a fraction, f , of nodes, which are called nominators. For each nominator, a co-nominator node is chosen. The co-nominator is randomly selected from the neighbour nodes of the nominator, called the co-nominator candidates. One node is randomly chosen as a nominee from the common neighbours of the nominator and its co-nominator. If the nominator and its co-nominator do not have a common neighbour, another co-nominator is chosen from the co-nominator candidates, continuing until a nominee is selected. If no nominee can be found, we randomly select a co-nominator candidate as a nominee.
An example of the JN strategy for undirected networks is shown in Figure 1(c), where nodes 1, 11, and 19 are randomly selected as nominators. Next, each selected node nominates a random neighbour as a co-nominator. In this example, nodes 2, 15, and 20 are co-nominators. Finally, each pair of nominator and co-nominator chooses a nominee from their common neighbours. In this figure, nodes 4, 10, and 18 will be nominated as the influential nodes. All of these steps are localized, decentralized, random, and do not require full knowledge of the network.
Based on this idea, we develop three variants: identifying hubs in undirected networks, identifying source hubs in directed networks, and identifying sink hubs in directed networks. There are two kinds of neighbours for any node in a directed network: predecessors and successors. A node's predecessor refers to its a neighbor node that has a link pointing to it, and a node's successor is a neighbour to which it points. There are also two kinds of hubs in directed networks: source hubs and sink hubs. Source hubs refer to nodes with high out-degree while sink hubs represent those with high in-degree. The predecessors act as neighbours for our JN strategy when the target nodes are source hubs, while the successors are regarded as neighbours when the target nodes are sink hubs. The algorithm of the proposed JN strategy in undirected and directed networks is summarized in Algorithm 1.

Input:
Network, G Total number of nodes in the network, N Fraction of nodes to be removed, f Output: A set of nodes (hubs/source hubs/sink hubs), R 1: Select N f nodes from the network randomly to S v 2: for each v in S v do 3: Select a node from N(v) randomly, u 7: Obtain neighbours/predecessors/successors of u, N(u) Remove u from N(v) 13: Assign u to e 14: end if 15: end while 16: Add e to R 17: end for 18: return R The JN strategy shows a preference for selecting higher-degree nodes, and has a distribution skewed more towards higher-degree nodes than the distributions obtained with the FN strategy.

Results
In this section, we experimentally validate the effectiveness of the JN strategy on the task of finding hubs in both undirected and directed networks. Identifying and immunizing hubs in a social contact network will impede the disease spread significantly, because this network features with a long-tail degree distribution in most cases. We also conduct a network spreading simulation on the original networks and the remaining networks after removing nodes identified by the proposed JN strategy. We use the SIR model in this simulation. The experiments are performed on an Ubuntu 18.04.3 LTS system with Lenovo ThinkStation, Xeon 24 cores, 64 GB RAM and a clock speed of 3.2 GHz.

Degree distribution of selected influential nodes.
An efficient high-degree nodes identification strategy should show a preference for high-degree nodes. To test the ability of the various strategies to select high-degree nodes, a node is chosen randomly, then the selection strategy is used to identify the chosen node. This procedure is repeated ten million times to build the degree distribution of selected nodes for each strategy. The SP, FN and JN strategies are used on four undirected networks. The degree distribution of the samples reflect each strategy's capability to find hubs in a network. Figure 2 shows the obtained degree distributions for each network and strategy. The four plots in the first row are the observed degree distributions of the selected influential nodes for each network. The distributions in the second row are obtained by dividing the distribution of the influential set by the network's degree distribution, which avoids the influence of the original degree distribution.
The degree distribution of nodes sampled with the SP strategy is very close to the network's degree distribution, evidenced by the horizontal normalized degree distribution in the bottom panel. This strategy shows no preference for nodes of any degree. The JN strategy improves probability of selecting the high-degree nodes, with lower probability of selecting low-degree nodes. The normalized degree distributions for the JN strategy have positive slopes, implying that the JN strategy tends to identify nodes with high degrees. Table 1 lists the slopes of the normalized degree distributions for all strategies and networks. The slopes of the normalized degree distributions for the nodes selected using the JN strategy are significantly higher than that obtained with the FN strategy, e.g., 1.3487 for JN compared to 1.07 for FN on SYN-BAc network.

Average degree of selected nodes
In this section, we use the different selection strategies (SP, FN, and JN) on both undirected and directed networks to select sets of nodes ranging from 2-20% of the original network size ( f = 0.02-0.2). The average degree of the identified set of nodes is measured for each method. The average degree metric reflects the performance of each strategy on finding high-degree nodes. The simulations are repeated for 20 times. Figure 3 shows the arithmetic mean of the identified nodes' degrees on four undirected networks (SYN-BAc, G-Lastfm, G-Deezer, and G-Github). The uncertainty represents the standard deviation of the averaging. For smaller fractions, the average degrees of the identified nodes are relatively large for the two nomination strategies, with the JN strategy providing further improvement over the FN strategy. The standard deviation is also largest for smaller fractions, but the average degree of the nodes selected with the JN strategy is still substantially higher than that from the FN strategy. As the fraction of identified nodes increases, the average degree of the identified set decreases. This is mainly because the number of hubs is limited in these networks. With the SP strategy, the selected nodes' average degrees stay effectively constant as the fraction of selected nodes is increased, remaining roughly equal to the average degree of the whole network. Thus, the SP strategy represents the worst-case performance of any influential node identification strategy. Best-case performance is shown by selecting only the highest-degree nodes (black line), representing strategies that require global knowledge of the network. Both the FN and JN strategies are effective at finding hubs, but the JN strategy always outperforms the FN strategy, especially when the selected fraction is small. Figure 4 shows the results for four synthetic directed networks with different average in-degrees and out-degrees generated by the Forest Fire model. It can be seen that the average in-degree and average out-degree of the original networks can affect the average degree of the nodes identified by the JN strategy. When the network is sparse (low average degree), the diversity of nodes' degrees is low. In other words, the network is homogeneous. For these networks, the out-degree nodes are more heterogeneous than the in-degree nodes, with higher degree on average than the in-degree nodes. In this case (k in = 3.1 and k out = 3.1), the JN strategy can find high-degree nodes, but we can hardly observe it very clearly, such as the black dashed lines with k in = 3.1 and k out = 3.1. However, for a denser networks, we can see that an obvious improvement on both average in-degree and average out-degree, such as green dashed lines with k in = 24.8 and k out = 24.8. This implies that when there are some obvious hubs in the networks, JN can find them efficiently. Figure 5 show the results of simulations on real-world directed networks, including social networks and biological networks. The top row shows the average in-degree and the bottom row the average out-degree. In all cases, the proposed JN strategy outperforms the FN strategy, except for the G-Metabolic network, where the performance gain is slight.   Figure 5. The average in-degree and out-degree of the selected influential nodes using the SP, FN and JN strategies (blue, red and green lines, respectively) on four real-world directed networks.

Average shortest path length of the remaining network
Removing the influential node set identified with the hub-finding strategy can alter the structure of the remaining network. If the average shortest path length of the remaining network is longer, then information or disease spread becomes slower, taking more time to reach the full network. Calculating the average shortest path length of the remaining network on a large-scale network is time consuming, i.e., O(n 3 ). The average shortest path length in our remaining networks is calculated by randomly selecting 5% of the nodes as reference nodes, and calculate the average shortest path length between each reference node and all other nodes. To reduce the uncertainty, the mean of 50 independent runs is used. Figure 6 shows the average shortest path length of the remaining networks after removing an increasing fraction of nodes ( f = 0.02-0.18) identified by different strategies. There is a clear rise of the average shortest path length as the fraction of nodes removed increases, with the JN strategy always performing better than the FN strategy for all networks tested. This means that node removal with the JN strategy can increase the average shortest path length more efficiently than the FN strategy, reducing the pathogen or information propagation rate in the network.

Epidemic threshold of the remaining network
The impact of removing hubs using the hub-finding strategies is measured by calculating the epidemic threshold, τ, (Equation (3)) of the remaining network. The mean epidemic threshold, τ , for each strategy is calculated using the average of epidemic thresholds from the results of 50 simulations. Figure 7 shows the efficiency of different strategies on increasing epidemic threshold. The proposed JN strategy always needs the least node removal to achieve a certain epidemic threshold, meaning that it has the highest immunization efficiency. For example, to increase the mean epidemic threshold from τ = 0.033 to 0.125 on SYN-BAc network, 12% of nodes identified by the FN strategy have to be immunized, compared to only 8% of nodes with the JN strategy, representing a 33.3% efficiency gain. On G-Github network, the initial epidemic threshold is nearly zero, which means that G-Github network is highly conductive, such that any disease, even with a very low spreading rate, can lead to an epidemic. For 20% node removal, the JN strategy increases the epidemic threshold to τ ∼ 0.12, while the FN strategy raises it to τ ∼ 0.1. The SP strategy hardly increases the epidemic threshold, which means it only protects the immunized individuals without much influence on the network structure property. The two nomination strategies can not only protect the immunized individuals, but also bring an extra benefit of making the network more difficult for disease to spread. The proposed JN strategy performs better than the FN strategy in raising epidemic threshold for all four network datasets.

SIR simulation results
In this section, network spreading simulations using the SIR model are conducted on the four undirected networks of SYN-BAc, G-Lastfm, G-Deezer, G-Github. For each network, a fraction f = 0.1, 0.2, and 0.3 of the nodes identified with the FN and JN strategies are removed from the original network. SIR simulations are performed on the six reduced networks using the EoN (EpidemicsOnNetworks) model 34,35 . The simulations use β = 1.5, γ = 1, and p = 0.01, where β is the transmission rate per edge, γ is the recovery rate per node and p is the ratio of the number of initially infected nodes to all of the nodes in the network.  scales have sharp increases at the beginning of the spreading process, reach a peak, then return down to almost zero. The reason for this shape is that there are many susceptible nodes at the beginning, and there is a large population of nodes that can be infected. As the population of susceptible nodes decrease, the recovery rate overtakes the infection rate, and the infection scale goes down. The differences between network, strategy and fraction of removed nodes are in the timing and height of the peaks. As more nodes are removed (larger f ), the peak time is delayed and the peak height is lowered. This occurs because the network's conductivity is decreased when more hubs are immunized. It is worth noting that, for any given f , nodes removed using the JN strategy can delay the infection scale peak's arrival time and lower its height more than nodes removed using the FN strategy, for all cases. This implies that the JN strategy outperforms the FN strategy in the task of immunizing influential nodes to impede disease spread. Figure 9 shows the cumulative infection scales' rising time. Different from Figure 8, we evaluate the cumulative infection scale, consisting of the infection scale plus the recovered scale. Thus, it rises monotonically. It is observed that the JN strategy always performs better than the FN strategy, since the final infection scale for the JN strategy simulations is smaller than that using the FN strategy in all networks.

Discussion
In this paper, we have developed a decentralized strategy for identifying influential nodes in a network without requiring global knowledge of the network structure. The proposed JN strategy first randomly selects a set of nodes from the network as nominators. Each nominator obtains a co-nominator from its neighbours (predecessors or successors in directed networks). Then each pair of nominator and co-nominator nominates a nominee from their common neighbours. The nominees are the identified important nodes.
The effectiveness and efficiency of the proposed strategy was investigated by conducting experiments on both synthetic and real-world networks. These datasets include both undirected and directed networks. The degree distribution and average degree of nodes identified by the JN strategy show that the JN strategy can find high-degree nodes in both undirected and directed networks. Compared to the FN strategy, the degree distribution skewed toward higher-degree nodes, and the average degree of the nodes was higher. Removing the nodes identified by the JN strategy raised the homogeneity of the remaining network. The average shortest path length and the epidemic threshold on the reduced networks were increased, and were larger than results obtained using the FN strategy. This implies that the proposed JN strategy can find influential nodes of the network efficiently.
SIR simulations were conducted on networks reduced by removing (immunizing) nodes identified by the JN and FN strategies. The network's conductivity decreased dramatically for both strategies, delaying the arrival time of the peak infection scale, and reducing the peak height. The cumulative infection scale was also reduced. However, our JN strategy was more effective at reducing the spreading speed, resulting in further delays to peak arrival, lowering peak height, and resulting in smaller cumulative infection scale. This implies that the nodes identified by the JN strategy were crucial nodes in the network to impede disease flow.
The computational complexity of the JN algorithm is low. The total computational time includes two nested loops. For the outer loop, the time to search for a nominee for each nominator is O(n f ), where n is the number of nodes in the network and f is the fraction of nodes required to be identified. The inner loop is about the joint nomination process, related to the number of each nominator's neighbours. Its computational time is O(k max ), where k max is the maximum degree of the network. Thus, the computational complexity of the JN algorithm is O(n f k max ). If the required number of identified nodes is fixed, that is, n f is constant, then the computational complexity of the JN algorithm will be unrelated to the network scale and the computational complexity decreases to O(k max ), where k max grows far slower than the network size and the actual node degree is small for vast majority of the nodes.
Our study is characterized by some limitations. Although the proposed JN strategy performs better than other decentralized strategies in the task of high-degree nodes identification (see Figure 3), there is still much room to improve, compared to the centralized strategy. Some properties of the networks, such as degree correlation and clustering coefficient, can affect the performance of the JN strategy, which has not been explored in-depth, so more network generated models are needed so that we can create networks with accurately controllable degree correlation and clustering coefficient. The effectiveness of the JN strategy has been testified on static networks but not on dynamic ones, which will be next on our agenda. The proposed JN strategy has not been designed for the networks with community structure, where nodes between different communities are vital for network integrity but do not always have high degrees. Our study uses degree centrality as the metric to measure nodes' importance, and treats high-degree nodes as influential nodes in the network. Some other metrics, such as betweenness centrality 25 , can also be applied to measure the nodes' influence. The computational complexity of betweenness centrality is very high, so finding a decentralized strategy without the whole network structure to identifying high-betweenness nodes would be key.
However, this research discovers a more efficient decentralized influential nodes identification strategy. The proposed JN strategy can be used in many cases, such as impeding the spreading of contagious diseases and detecting high-risk accounts in financial networks. This research has also opened a number of avenues for further research. Localized and decentralized influential nodes identification strategies have many advantages over global and centralized strategies, such as lower computational complexity and less information requirement, but suffer from relatively low performance. Some local information can be incorporated, such as neighbours' degrees, to improve the strategy's efficiency. The effects of degree correlation and clustering coefficient can be considered to develop a more flexible and effective strategy. The JN strategy can be modified to be applied on temporal networks that is closer to real social contact networks where contagious pathogens spread.

Data description
Numerical simulations are conducted on four undirected-network datasets (one synthetic network and three real networks) and eight directed-network datasets (four synthetic networks and four real networks). Statistical properties of these networks are shown in Table 2. SYN-BAc 36 : This synthetic network is generated using the Holme and Kim algorithm 36 . It is based on Barabasi-Albert preferential attachment model 37 , with an extra step that each random edge is followed by a chance of making an edge to one of its neighbours. This model can generate a network with a power-law degree distribution and tunable clustering coefficient. We set n = 8 000, m = 4, and p = 0.9 respectively. G-LastFM 38 : This network is a real network dataset from LastFM, an online music service, collected from their public API in March 2020. Each node represents a user. There is an edge between two nodes if they follow each other.
G-Deezer 38 : This is a real network dataset captured from Deezer, an online music service, in March 2020. Nodes are Deezer users from European countries and edges are mutual follower relationships between them.
G-Github 39 : This real network dataset is from Github, collected in June 2019. Each node represents a user who stars at least 10 repositories and edges are mutual follower relationships between them.
SYNs-Forestfire 40 : These directed networks are generated by the Forest Fire model with two parameters: forward-burning probability and backward-burning probability. The Forest Fire model generates networks with power-law degree distributions. We generate four directed networks using this model.
G-Wiki 41 : The wikipedia community held a public vote on order to determine its administrators. A voting network was generated during this process. The network contains voting data from the beginning of Wikipedia until January 2008. Nodes represent wikipedia users, and if one user voted for another, there is a directed link between those two nodes.
G-Twitter 42 : This network is an ego subgraph from Twitter. Twitter users are nodes and there is a directed link from one node to another if that users follows the other.
G-Email 43 : This email network is collected for 112 days at University of Kiel, Germany. Nodes represent email addresses and there is an directed edge between nodes if a user sent more than one email to another.
G-Metabolic 44 : This network reflects the metabolic reactions of the E. coli bacteria. Each node is a metabolite, and each directed edge means that there is a reaction where one node is an input and the other is a product.

11/14
Metrics of interest Degree distribution: The degree distribution, p k , provides the probability that a randomly selected node in the network has degree k. The degree distribution can capture the full-scale structure of a network.
Degree centrality: Degree centrality is the simplest metric to measure a node's importance: the more links a node has, the greater the importance of the node. Each strategy's capability of finding hubs is measured by the average degree of the identified nodes.
Average shortest path length 45,46 : The shortest path length between two nodes in a network is defined as the number of edges in the shortest path between these two nodes. The average shortest path length, L, of a network is the average of the shortest path lengths over all pairs of nodes, calculated as where N is the number of nodes, N is the set of nodes of the network, and d i j is the shortest path length between node i and node j.
If there are disconnected components in a network, the average shortest path length will diverge. The average shortest path length of a network is calculated on its largest connected component. The average shortest path length is a strong indicator of a network's ability to transport pathogens or information. A smaller L means a node has a greater probability to become infected, and therefore a high propagation rate.
Epidemic threshold 47 : The spreading rate of a pathogen and the epidemic threshold of a network are two factors that determine how the pathogen propagates in the network. A pathogen's spreading rate depends on the biological characteristics of the pathogen. A network's epidemic threshold reflects the capability of the network to resist a pathogen. If the spreading rate of a pathogen exceeds the epidemic threshold of a network, then it will lead to an epidemic on the network, assuming that the spreading rate of a pathogen is constant. Otherwise, the spread of the pathogen dies out. We measure the changes in epidemic threshold in the networks after the removal of the nodes identified by JN and other baseline strategies. Previous work 47 indicates that for an SIS (Susceptible-Infected-Susceptible) model, the epidemic threshold, τ, for a network is where λ max is the largest eigenvalue of the network's adjacency matrix. We use this formula to calculate the epidemic thresholds of networks. Infection scale and cumulative infection scale: Simulating virus spread in a network can be done using Susceptible-Infected-Recovered (SIR) models 48 . The infection scale is the number of infected individuals (I) at any point in time over the total number of individuals. The cumulative infection scale is the sum of infected individuals (I) plus recovered individuals (R), over the total number of individuals.