Best influential spreaders identification using network global structural properties

Influential spreaders are the crucial nodes in a complex network that can act as a controller or a maximizer of a spreading process. For example, we can control the virus propagation in an epidemiological network by controlling the behavior of such influential nodes, and amplify the information propagation in a social network by using them as a maximizer. Many indexing methods have been proposed in the literature to identify the influential spreaders in a network. Nevertheless, we have notice that each individual network holds different connectivity structures that we classify as complete, incomplete, or in-between based on their components and density. These affect the accuracy of existing indexing methods in the identification of the best influential spreaders. Thus, no single indexing strategy is sufficient from all varieties of network connectivity structures. This article proposes a new indexing method Network Global Structure-based Centrality (ngsc) which intelligently combines existing kshell and sum of neighbors’ degree methods with knowledge of the network’s global structural properties, such as the giant component, average degree, and percolation threshold. The experimental results show that our proposed method yields a better spreading performance of the seed spreaders over a large variety of network connectivity structures, and correlates well with ranking based on an SIR model used as ground truth. It also out-performs contemporary techniques and is competitive with more sophisticated approaches that are computationally cost.

The kshell decomposition method (ks) is a global centrality method. It takes account of the topological position of the others node while ranking the nodes. To do so, a network should be connected. In other words, the percentage of the giant component should be 100%, then only it can measure the topological position of other nodes. On the contrary, for a disconnected network, the measurement of topological position for all the nodes is not possible. Unlike ks, the k sum which is a local centrality method takes account of nearest neighbors information. Therefore, it can work well for disconnected network. With the help of two schematic undirected networks Figure 1a and 1b, we show how the ranking and spreading performance of ks and k sum depends on the size of the giant component. First, we discuss the schematic network 1a which is a connected network. Means every node has a path to reach the other nodes. This network holds only one component and it is a giant component. This giant component consists of all the vertices and edges of this network, therefore, the percentage of giant component is 100%. Next, we verify how the ks and k sum are performing in this type of network 1a to identify the best spreaders. We apply ks, k sum and also measure the spreading capability of each node using the SIR epidemic model (see the details in Results section). From the figure, it reveals that the best spreaders of the kshell decomposition method and SIR epidemic model are similar, i.e. {a, b, c, d}. Whereas, the best spreaders of k sum method are {a, x, c, d}, where, the node x is identified from the periphery of the network and its spreading capability is less as per the SIR model. On the other hand, the second schematic network 1b is a disconnected network. It consists of three components. The largest component is the giant component which holds 52% vertices of the total number of vertex. Next, we apply ks, k sum , and SIR model to measure the best spreaders of this network. From the figure, it is observed that k sum and SIR model have identified a similar set of nodes as best spreaders i.e. {i, e, h, f }, whereas kshell has identified a different set of nodes {a, b, c, d} as best spreaders and their spreading capabilities are less according to the SIR model. From the above analysis, we can conclude that the kshell decomposition method works well when the percentage of giant component is high, otherwise, it performs poor. Moreover, the k sum performs best when the percentage of giant component is low. Therefore, we can claim that the percentage of giant component can affect the ranking of ks and k sum methods.

2/4
Supplementary Note 2: Dataset We examine the efficiency of the proposed method on various types of real network datasets. The used network datasets have been classified into four categories: social network, collaboration network, citation network, and neural network.
Social Network: The social network represents the social relationships among the users, where the nodes represent users and the edges represent the connection between the users. We have examined with five real social networks in our experiment to evaluate the proposed method. These are (1) Ego-Facebook is a friendship network of facebook.com, (2) Polblogs network is a blogging network during US election 2004, (3) Advogato is a trusted social network for accessing the free software between the users, (4) Epinions is a social network of Epinions.com, (5) Brightkite is a location-based social network.
Collaboration Network: The collaboration network represents a scientific collaboration between the authors. If an author is a co-author of a paper with another author, then an undirected edge will be created between them. We have used four collaboration networks in the experiment. Such as (1) Netscience is a scientist co-authorship network, (2) CA-GrQc is a collaboration network of general relativity and quantum cosmology category, (3) CA-CondMat is a collaboration network of condense matter category, and (4) CA-HepTh is a collaboration network of high energy physics theory category.
Citation Network: The citation network represents the citations among the papers. If a paper i cites to paper j then a direct edge will be created from i → j. We have used two citation networks in the experiment and used them as undirected networks. These are (1) Cit-HepTh covers all the citations of the papers from 1993 January to 2003 April, (2) Cora is a citation network of cora.
Neural Network-Celegans: is a neural network of frontal neurons where nodes represent neurons and edges represent synapses.