Exploring influential nodes using global and local information

Hu, Haifeng; Sun, Zejun; Wang, Feifei; Zhang, Liwen; Wang, Guan

doi:10.1038/s41598-022-26984-4

Download PDF

Article
Open access
Published: 29 December 2022

Exploring influential nodes using global and local information

Haifeng Hu¹,
Zejun Sun¹,
Feifei Wang¹,
Liwen Zhang¹ &
…
Guan Wang¹

Scientific Reports volume 12, Article number: 22506 (2022) Cite this article

1349 Accesses
5 Citations
Metrics details

Subjects

Abstract

In complex networks, key nodes are important factors that directly affect network structure and functions. Therefore, accurate mining and identification of key nodes are crucial to achieving better control and a higher utilization rate of complex networks. To address this problem, this paper proposes an accurate and efficient algorithm for critical node mining. The influential nodes are determined using both global and local information (GLI) to solve the shortcoming of the existing key node identification methods that consider either local or global information. The proposed method considers two main factors, global and local influences. The global influence is determined using the K-shell hierarchical information of a node, and local influence is obtained considering the number of edges connected by the node and the given values of adjacent nodes. The given values of adjacent nodes are determined based on the degree and K-shell hierarchical information. Further, the similarity coefficient of neighbors is considered, which enhances the differentiation degree of the adjacent given values. The proposed method solves the problems of the high complexity of global information-based algorithms and the low accuracy of local information-based algorithms. The proposed method is verified by simulation experiments using the SIR and SI models as a reference, and twelve typical real-world networks are used for the comparison. The proposed GLI algorithm is compared with several common algorithms at different periods. The comparison results show that the GLI algorithm can effectively explore influential nodes in complex networks.

Dynamic identification of important nodes in complex networks based on the KPDN–INCC method

Article Open access 09 March 2024

Influential nodes identification using network local structural properties

Article Open access 03 February 2022

Integrating local and global information to identify influential nodes in complex networks

Article Open access 14 July 2023

Introduction

In recent years, human production and life have become increasingly inseparable from different types of networks¹. In sensor networks, a sensor represents the main device to obtain desired information. Then, through a self-group sensor network, the perceived information can be transmitted to a server for further data processing^2,3. Financial networks have brought great convenience to human life. For instance, when people shop on the Internet, they can enjoy the convenience brought by the consumer financial network⁴. In addition, in social networks, people can make new friends, thus constantly expanding their personal connections and achieving better communication with other people⁵. Currently, many phenomena can be described using complex networks, such as social activities, smart sensor applications, consumer finance, and consumer finance services⁶. Complex networks are associated with countless nodes, so accurately exploring and identifying their key nodes can solve many application problems, including fault node positioning, prevention and control of fraud risk, and friend recommendations^7,8,9. Therefore, how to design an efficient algorithm to identify key nodes of a complex network accurately is an urgent problem. In an intelligent sensor network, accurately identifying and effectively protecting important sensors can avoid security attacks¹⁰. In this way, when a fault occurs in a network, the fault position can be located fast and the fault can be timely addressed¹¹. Further, in a consumer finance business network, the risk of fraud of credit customers can be judged effectively¹², financial institutions can be helped to filter inferior users, and the first risk control link can be established to delete fraud personnel¹³. In addition, when recommending friends in a community network, the identification of key nodes can help users to recommend friends that they have not added yet but may know and send the result to users as a "friend recommendation," which can improve user loyalty¹⁴.

The most popular approach to explore influential nodes in complex networks has been to used centrality measures^{15,16,17,18,19,20,21}. From the perspective of global and local information, typical methods based on local information include methods considering the degree centrality (DC)²² and methods considering the eigenvector centrality (EC)²³. The DC-based methods calculate the node influence by analyzing the number of edges connected by nodes, which has the advantages of high simplicity and fast calculation speed. However, these methods consider only local information and have low accuracy²⁴. The EC-based method calculates the node influence considering both the number and the information of adjacent nodes. Namely, when the number of adjacent nodes is very large, their influence will be very strong. But The EC method score much prefers to concentrate in a few nodes under common conditions, making it hard to distinguish among the nodes²⁵. In contrast, methods based on global information include the closeness centrality (CC)²⁶, betweenness centrality (BC)²⁷, and K-shell decomposition methods²⁸. The BC and CC methods have high accuracy in identifying critical nodes, but their long computational time limits their application to large networks²⁹. The K-shell decomposition method calculates the node influence based on the node location, but the hierarchical results are coarse-grained, resulting in a low node discrimination³⁰. The aforementioned algorithms consider either global or local information of nodes, which causes certain limitations in practical applications.

The node influence information obtained based on multiple datasets is more accurate than that obtained using only a single attribute³¹. Therefore, this study uses global information to calculate the global influence of nodes but local information to calculate their own influence and the influence of their adjacent nodes. The nodes with the same global influence are distinguished to improve the differentiation effect of node influence.

Basic ideas of proposed GLI method

The factors should be considered comprehensively from both global and local aspects. For instance, an important project in the real world can usually be decomposed into multiple subprojects, so a team responsible for completing the project needs to be divided into groups, each of which will be responsible for one subproject. Such a team can be regarded as a complex network where team members denote nodes, and the influence of each member depends on his position in the network. The higher the position is, the greater the number of resources will be; for instance, the project team leader is more influential than the group leader, and the group leader is more influential than the ordinary members. At the same time, the self-capability and the help provided by the team members are also important factors that define the influence. If members A and B have many common neighbors, member B is considered to be close to member A, members B and A have a great similarity, and member B has a greater influence on member A. The more similar the other members are with a particular member, the greater their influence on the member is. Accordingly, the proposed GLI method considers the contribution of both global and local information. First, global information is represented through network hierarchy obtained by the K-shell method; next, local information is represented by the degree of self and adjacent nodes, and adjacent nodes’ Ks values. In addition, the similarity coefficient is introduced, and the higher the similarity between the nodes is, the greater the contribution provided by the adjacent nodes will be.

Contribution of proposed GLI

This study provides an innovative research perspective for the identification of key nodes in a complex network. The proposed GLI algorithm’s innovation is mainly reflected in three aspects, which are as follows:

1.
An accurate key node identification method, which considers the influence of nodes from both global and local aspects, is proposed. The shortcomings of the existing coarse-grained methods based on global information are addressed, and the accuracy of the key node identification is improved;
2.
The proposed GLI method uses the similarity coefficient between nodes in a network, which enhances the differentiation ability of adjacent gives values and improves the identification ability of key nodes;
3.
The proposed GLI method considers both global and local information, which makes it highly practical and suitable for large and complex networks.

The rest of this article is organized as follows. Section “Related work” describes related work. Section “Proposed GLI” describes the proposed GLI method and presents its design idea and specific working process. Section “Experimental Results” compares the proposed GLI method and classical algorithm on different datasets and analyzes the comparison results. Finally, Section “Conclusion” summarizes the main contributions of the study.

Related work

Many factors influence the accuracy of key node identification in complex networks. In the following, a very brief survey of the methods relevant to the proposed GLI is provided. The measures considered in these methods are introduced from the global and local perspectives.

(1)
Local centrality
- DC: This is a local centrality measure related to the number of edges connected by nodes.
- EC: This is a local centrality measure related to a node’s degree and influence of its neighbors.
- PageRank³²(PR): Similar to the EC, this measure reflects the importance of a node, which is determined by both the quantity and the quality of adjacent nodes. This is a local centrality measure using the node degree and the PR value of the neighboring nodes.
- ProfitLeader³³(PL): The ProfitLeader algorithm computes the profit a node provides to the other nodes, where the importance of the node is related to the profit. This is a local centrality measure using node profit and sharing probability to its neighbors.
(2)
Global centrality
- BC: Global centrality measure uses the number of shortest paths through the node.
- CC: Global centrality measure denotes the relative shortest path between the pairs of nodes.
- K-shell: In a network, nodes are decomposed in a layer-by-layer manner according to their degree values, and the node importance is evaluated from the level where the node is located in. This is a global centrality measure based on the node degree.
(3)
Combination of global and local centrality
- GIN³⁴: This is a combination of global and local centrality obtained based on the distance, degree, and all other node-related information.
- KBKNR³⁵: The KBKNR algorithm reflects the influence of adjacent nodes and secondary-adjacent nodes through the influence coefficient, which is then used to distinguish the nodes based on their importance. This is a combination of global and local centrality using the k-shell, distance, and degree values.
- RLGI³⁶: This is a combination of global and local centrality obtained based on the k-shell and degree.
- GLS³⁷: Combination of global and local centrality based on the k-shell, distance, degree, and eigenvector centrality.

Proposed GLI

Definition

The research objectives of this paper include undirected and unweighted networks, which can be expressed as Graph = (Vertex, Edge). This section introduces some of the definitions. The following definitions are related to an undirected, unweighted network.

Definition 1

Node degree³⁸: The node degree represents the number of edges that a particular node joins. Assume, a network is defined as A = (a_ij) N × N; then, the node degree d(v_i) can be expressed as follows:

$$ d\left( {vi} \right) = \sum\limits_{j = 1}^{N} {aij} = \sum\limits_{j = 1}^{N} {aji} $$

(1)

$$ \max D = \max \left( {d\left( {vi} \right)} \right) $$

(2)

where maxD indicates the maximal node degree.

Definition 2

K-shell value of a node: The K-shell value is obtained by the K-shell algorithm, and it is calculated by:

$$ Ks\left( {vi} \right) = K - shell\left( G \right) $$

(3)

Definition 3

Global node influence: The K-shell algorithm can calculate the global position value of a node v_i, which measures the global influence of v_i, Global(v_i). The specific calculation formula is as follows:

$$ Global\left( {vi} \right) = Ks\left( {vi} \right) $$

(4)

Definition 4

Adjacent node similarity: A given value of an adjacent node represents the factor affecting the importance of a node, and there is a relationship between the given value size and the similarity between the nodes. Adjacent nodes with different local structures may enhance the differentiation effect of a given influence of adjacent nodes. In GLI method, the higher the node similarity is, the higher the proportion of influence the node can provide, so the proposed algorithm adopts the Jaccard similarity coefficient³⁹J(v_i,v_j), which reflects the similarity of adjacent nodes, and it is calculated by:

$$ Jacc\left( {vi,vj} \right) = \left| {\frac{{n\left( {vi} \right) \cap n\left( {vj} \right)}}{{n\left( {vi} \right) \cup n\left( {vj} \right)}}} \right| $$

(5)

where n(v_i) represents the set of nodes that have common edges with a node $vi$ and contains this node, n(v_j) denotes the set of nodes that have common edges with node $vj$ and contains node v_j.

Definition 5

Local influence: Local influence considers two factors: personal influence of a node and the contribution of its adjacent nodes.

Assume that P(v_i) denotes the personal influence of a node; then, P(v_i) is determined by d(v_i) and calculated by:

$$ P\left( {vi} \right) = d\left( {vi} \right) $$

(6)

Assume that v_j is an adjacent node of node v_i; then, the given value П(v_j) of node v_j can be obtained by comprehensively considering d(v_i), Ks(v_j), and J(v_i,v_j), which is expressed by:

$$ \prod {\left( {vj} \right)} = d\left( {vj} \right) \times Jacc\left( {vi,vj} \right) + Ks\left( {vj} \right) $$

(7)

The Sum(v_i) represents the sum of the given influence for all the adjacent nodes, Sum(v_i) calculation formula is as follows:

$$ Sum\left( {vi} \right) = \sum\limits_{{vj \in \Gamma \left( {vi} \right)}} {\prod {\left( {vj} \right)} } $$

(8)

Further, assume that in the considered network, node v_i has the most maxD adjacent nodes to provide the given influence. Then, Sum(v_i) is divided by maxD to normalize, and the given influence of node vi on its adjacent node is obtained by:

$$ N\left( {vi} \right) = {{Sum\left( {vi} \right)} \mathord{\left/ {\vphantom {{Sum\left( {vi} \right)} {\max D}}} \right. \kern-0pt} {\max D}} $$

(9)

where Local(v_i) indicates local influence, and it is defined by:

$$ Local\left( {vi} \right) = P\left( {vi} \right) + N\left( {vi} \right) $$

(10)

Definition 6

Node influence: The node influence I(v_i) depends on Global(v_i) and Local(v_i), and it is calculated as follows:

$$ I\left( {vi} \right) = Local\left( {vi} \right) + Global\left( {vi} \right). $$

(11)

Proposed model evaluation

To verify the effect of the proposed algorithm, two evaluation models, the SIR and SI models, were selected. The reliability of the comparison analysis was validated by experimental results.

SIR model

The SIR model⁴⁰ is a mathematical model describing disease transmission, which is a general standard for evaluating the accuracy of node identification.

In the SIR model, network nodes are divided into three categories as follows:

Susceptible: A susceptible node is a node that is not sick but lacks the immune ability and is vulnerable to the infection after contact with a sick node.
Infection: A sick node is an infected node that can infect a susceptible node.
Removed: A node removed from a network, recovered (with immunity) or dead; these nodes are no longer involved in the infection process.

Next, assume that at time t, the total node number All(t) is unchanged, and nodes can be in one of the three states: susceptible to infection point Susceptible(t), sick node Infection(t), or removed from node Removed(t), and it holds that All(t) = Susceptible(t) + Infection(t) + Removed(t).

At time t, the number of infected nodes is α * Susceptible(t) * Infection(t).

After the time interval of Δt, changes in the numbers of susceptible, infection, and removed nodes are respectively as follows:

Susceptible: Susceptible(Δt) = − α * Susceptible(t) * Infection(t).
Infection: Infection(Δt) = α * Susceptible(t) * Infection(t) − β * Infection(t).
Removed: Removed(Δt) = β * Infection( t ).

SI model

Similar to the SIR model, the SI model⁴¹ is the simplest disease transmission model. In the SI model, network nodes are divided into two groups: susceptible nodes and infection nodes. At time t, a susceptible node may become an infected node with a probability of α, and this process is irreversible.

In the experiment conducted in this study, network nodes were treated as sick nodes, and the number of infected nodes was calculated using the SIR and SI models. The calculation process included multiple iterations to illustrate the infection influence of nodes. The results of the network obtained by the above two models were used as an evaluation criterion. The results of the proposed GLI algorithm and several related algorithms were compared.

Kendall coefficient

The Kendall coefficient τ is used in this study to determine the similarity between the ranking results of the proposed algorithm and those of the SIR model⁴² on the same network. Assume nodes v_i and v_j are selected by the GLI algorithm to obtain the values GLI(v_i) and GLI(v_j). Then, node v_i and v_j are processed by the SIR model, and values SIR(v_i) and SIR(v_j) are obtained. If GLI(v_i) > GLI(v_j) and SIR(v_i) > SIR(v_j), or GLI(v_i) < GLI(v_j) and SIR(v_i) < SIR(v_j), then the resulting values are considered consistent, and $\tau$ = 1. If GLI(v_i) > GLI(v_j) and SIR(v_i) < SIR(v_j), or GLI(v_i) < GLI(v_j) and SIR(v_i) > SIR(v_j), then the resulting values are considered inconsistent, and $\tau$ = -1. The specific calculation formula is as follows:

$$ \tau (X,Y) = \frac{{2 * \left( {n_{c} - n_{d} } \right)}}{{n * \left( {n - 1} \right)}} $$

(12)

where X and Y represent the evaluated object, n_c is the number of consistencies in two sequences, and n_d indicates the number of inconsistencies in the two sequences.

Experimental results

Algorithm process

For detailed procedures, please refer to Supplementary Information.

Example description

In Fig. 1, the proposed algorithm is described in the example of the calculation process of the influence of node v₁:

(1)
Node degree

According to the idea of GLI algorithm, the node degree of node v₁ and its adjacent nodes is calculated by Eq. (1), and the results are shown in Table 1. The maximum degree is maxD = 6, and it is calculated by Eq. (2).

Table 1 Degree of v₁ and adjacent nodes.

Full size table

(2)
Ks value of nodes

According to Eq. (3), after the decomposition by the K-shell algorithm, the Ks value of node v₁ and its adjacent nodes is obtained, and the results are shown in Table 2.

Table 2 Ks of v₁ and adjacent nodes.

Full size table

(3)
Global influence of node

According to Eq. (4), it is obtained that: Global (v₁) = Ks(v₁) = 3.

(4)
Similarity coefficient of nodes

According to Eq. (5), the similarity coefficient of node v₁ and its neighbors, denoted by J(v₁,v_j), is calculated, and the obtained results are shown in Table 3.

Table 3 The similarity coefficient values of node v₁ and its adjacent nodes.

Full size table

(5)
Local influence

The given influence of the adjacent nodes of node v₁ is calculated by Eq. (7), and the results are shown in Table 4.

Table 4 Local influence values of node v₁ and its adjacent nodes.

Full size table

Second, according to Eq. (6), it holds that: P(v₁) = d(v₁) = 6.

Finally, according to Eq. (10), we have Local(v₁) = 10.304894.

(6)
Node influence

According to Eq. (11), it can be obtained that: I(v₁) = Local(v₁) + Global(v₁) = 13.304894.

Following the above-presented steps, the influence values of all nodes in Fig. 1 are calculated, as shown in Table 5.

Table 5 Influence of the nodes in the sample network presented in Fig. 1.

Full size table

Data description

Twelve real representative networks were selected to evaluate the proposed GLI algorithm, and they are as follows:

1.
Blogs network⁴³: This is a network of hyperlinks between the homepage of the 2004 US election blog, which includes 1,224 nodes and 19,052 edges.
2.
Ca-Astroph network⁴⁴: This is a collaborative network of scientific collaboration relationships between the astrophysical category author papers; it includes 18,771 nodes and 198,050 edges.
3.
Friendship network⁴⁵: This network represents the connections between users on hamsterster.com and has 1,858 nodes and 12,534 edges.
4.
Email EU32430 network⁴⁶: This network expresses the communication between email users and consists of 32,430 nodes and 54,397 edges.
5.
Polbooks network⁴⁷: A network of online book sales built on the relationships between American political book buyers, with 105 nodes and 441 edges.
6.
Jazz network⁴⁸: A collaboration network of a group of jazz musicians, including 198 nodes and 2,742 edges.
7.
Football network³³: This network represents a college football league and consists of 115 nodes and 616 edges.
8.
Karate network⁴⁹: This network is a network of Karate Club members, with 34 nodes and 78 edges.
9.
Protein network⁵⁰: This network consists of proteins that interact with each other, and it includes 1,870 nodes and 2,277 edges.
10.
USAir2010 network⁵¹: The 2010 US network contains 1,574 nodes and 17,215 edges.
11.
Reactome network⁵²: This network consists of proteins that interact with each other, and it includes 6,327 nodes and 147,547 edges.
12.
Brightkite network⁵³: This network is a social networking, and it includes 58,228 nodes and 214,078 edges.

The relevant property statistics of the experimental datasets are given in Table 6.

Table 6 Characteristic statistics for ten real networks used in the comparison experiment.

Full size table

Experimental result

To evaluate the applicability of the GLI algorithm, nine typical algorithms were implemented by Python, and the experiments were performed on ten datasets of different sizes. The experimental hardware platform included a Lenovo desktop computer, a CPU: i5-10,100, a memory of 32 GB; the software environment was Spyder (Python 3.7.3).

Experimental results comparison with the SIR model

(1)
Kendall value analysis

The experimental networks and their characteristic information are shown in Table 6. The range of α in the SIR model was [0.01, 0.1], and the step length was 0.01; this value was selected to avoid the network infection being too slow or too fast. First, the Kendall value results of the BC, CC, DC, EC, k-shell, PR, GIN, PL, KBKNR, RLGI, GLS, and GLI algorithms on different datasets were compared with the infection results of the SIR model. Then, the Kendall values of the proposed GLI algorithm and the other algorithms were compared. The comparison results are shown in Fig. 2.

The Kendall τ values of the GLI was the highest at all infection probabilities in the Protein network. When calculating the influence coefficient, the KBKNR algorithm takes the number of neighbor nodes as the divisor. While in the protein network, for nodes with a presence degree of zero, the algorithm cannot run correctly. In the six infection networks, the Blogs, Ca-Astroph, Friendships, EmailEU32430, Reactome and USAir2010 networks, the GLI algorithm performed better than the other algorithms. Among the above seven networks, the values of the maximum degree is relatively large, the values of the average degree was relatively small, and the distinction degree of the nodes’ degree values was large. Therefore, GLI has a better performance in these networks.

In the Brightkite network, the GLI, GIN, and KBKNR algorithms were superior to the other algorithms. In the Polbooks and Karate networks, the Kendall τ value of the GLS algorithm was higher than that of the GLI algorithm, but the GLI algorithm performed better than the other algorithms. In Brightkite, Polbooks and Karate networks, the distinction degree of the nodes’ degree values was small. Therefore, the advantages of the GLI algorithm are not obvious.

There are strong relationships between the hierarchical measure, the centrality measure, and the topological properties of the network⁵⁴. In Jazz and football networks, the connections between local nodes are relatively dense and have an obvious community structure. The distinction between K-shell value and degree value is not high. Therefore, the GLI algorithm does not work well in these two networks.

(2)
Optimal algorithm under different infection probabilities

As illustrated in Fig. 3, the GLI algorithm achieved a maximum value of 51.67% at different infection probabilities on the 12 networks. The maximum results of the other algorithms were as follows: 16.67% for the GLS algorithm, 10% for the KBKNR algorithm, 12.5% for the EC algorithm, 8.33% for the GIN algorithm, and 0.83% for the PL algorithm. In addition, the GLI algorithm performed well on all networks.

(3)
Top-15 important nodes in different networks

Without a loss of generality, in this experiment, the SIR model infection probability α was set to 0.02, and the recovery probability was set to one. First, the results of different algorithms on the network datasets were obtained and compared with the SIR model. Then, the 15 most influential nodes were extracted from the results. Finally, the algorithms’ performances were analyzed by ranking the nodes. The first 15 nodes of the Karate, Jazz, and Ca-Astroph networks in the large, medium, and small three-type networks were selected and illustrated.

As presented in Table 7, the GLI, EC, GLS and PR algorithms achieved identical results for 14 nodes out of the first 15 nodes of the SIR model. The GLI, GLS and PR algorithms were rank-aligned with the top-three nodes of the SIR model, achieving the best results among all algorithms. The EC algorithm ranked the first two nodes of the SIR correctly and was the second-best performing algorithm, following the GLI, GLS and PR algorithms.

Table 7 Top-15 nodes in the Karate network obtained by different algorithms.

Full size table

Table 8 shows that the top-15 nodes of the Jazz network were ranked, and the PL algorithms achieved the best results. Fourteen nodes out of 15 nodes were the same as those of the SIR model, and the ranking of the first four nodes was completely consistent with the SIR model. The GLI algorithm ranked 12 nodes out of the 15 nodes the same as in the SIR model, and the first four nodes were identical to the first four nodes of the SIR model. Although the GLI algorithm performed poorly compared with the PL algorithm, it achieved better results than the other algorithms, which indicated that the GLI algorithm was effective.

Table 8 Top-15 nodes in the Jazz network obtained by different algorithms.

Full size table

As shown in Table 9, for the first 15 nodes of the Ca-Astroph network, 13 nodes of the GLI, DC, and GIN algorithms were the same as those in the SIR model, and the importance of the first 15 nodes was basically the same. However, the GLI and GIN algorithms had the same two nodes as the SIR, GLI, and GIN algorithms, which achieved the best results and were followed by the DC algorithm. For the PR and CC algorithms, 12 nodes and 11 nodes out of 15 nodes were identical to those in the SIR model, respectively. For the Ca-Astroph network, the worst-performing algorithms were the K-shell and KBKNR, having only one node identical to the SIR model nodes.

Table 9 Top-15 nodes in the Ca-Astroph network obtained by different algorithms.

Full size table

Consequently, different algorithms had different advantages for different networks. However, the proposed GLI algorithm performed generally the best among all algorithms on the above-presented three networks, having the most obvious advantages.

(4)
Node ranking comparison of 11 algorithms

The above section analyzes the results of the top-15 key nodes identified by different algorithms and the SIR model in the Karate, Jazz, and Ca-Astroph networks. This section analyzes the sorting results of all nodes in each of the twelve networks. First, the infection values of nodes were calculated in the SIR model. Next, the nodes were arranged in descending order according to their importance values obtained by each algorithm. Then, the sequence of infection values was obtained from the node ranking results. It should be noted that if the ranking results of the algorithm were consistent with the results of the SIR model, a curve with a smooth downward trend from left to right would be formed. The results of a single node denoted as a seed node according to its infection value obtained by different algorithms are presented in Fig. 4, where the abscissa represents the number of infected nodes in the network obtained by each of the algorithms, and the ordinate represents the number of nodes infected and recovered at time t.

In Fig. 4, the data of the Polbooks, Jazz, Football, and Karate networks, which were small networks, are displayed on the linear scale; for the remaining eight networks, which had a large number of nodes, the data are displayed on the logarithm scale, focusing on the most influential nodes. As shown in Fig. 4, for the Blogs network, the result of the GLI algorithm showed an overall smooth decreasing trend, with the least number of peaks among all the algorithms. For the Ca-Astroph, Friendships, Brightkite, Reactome and USAir2010 networks, the results of the GLI algorithm had a few peaks, indicating that individual nodes were biased, but the proposed GLS algorithm’s results had the best effect among all the algorithms. For the EmailEU32430 network, the GLI, GLS, KBKNR, and K-shell algorithms performed well, but the curve decline of the results of the KBKNR and K-shell algorithms was reduced, the proposed GLS algorithm’s results fluctuated less and had the best effect among all the algorithms. Further, for the Polbooks network, the proposed GLI algorithm’s results fluctuated less and had the best effect among all the algorithms. For the Jazz network, the right part of the curve formed by the GLI algorithm had the least fluctuation and the best effect among all the algorithms. However, for the protein network, the KBKNR algorithm could not run, so its curve is not shown in Fig. 4, and among the remaining algorithms, the GLI algorithm achieved the best results. Therefore, the GLI method performed the best among the ten networks on the Blogs, Ca-Astroph, Friendships, EmailEU32430, Polbooks, Jazz, Protein, USAir2010, Reactomeand and Brightkite networks. The data curves in the stacked map showed a smooth downward trend, which was consistent with the SIR model results. For the Football network, due to the small difference in the degree value between the nodes, the curves of all algorithms showed certain fluctuations. The fluctuations of the KBKNR and EC algorithms were small, and their effect was relatively good. In the Karate network, except for the obvious curve fluctuations of the K-shell, CC, BC, and PR algorithms, the other algorithms showed a smooth downward trend, with a slight difference.

Consequently, the proposed GLI algorithm performed the best among all the algorithms on most networks, having similar results as the SIR model. Thus, the proposed algorithm could accurately identify key nodes in the networks.

Experimental results comparison with the SI model

To analyze the performance of the proposed algorithm further, the SI model was used to evaluate the key nodes identified by different algorithms. Due to limited space, only the Kendall values obtained by the algorithms are presented in this section.

The value of the infectious probability α plays an important role in the experiment. The infected rate is (1/2)^θ(here we set θ = 3)^55,57.

The Kendall values obtained by different algorithms for different networks are presented in Fig. 5.

As shown in Fig. 5, the Kendall τ value of the GLI algorithm was higher among all the algorithms for the Blogs, Friendships, USAir2010, Protein, Brightkite and EmailEU32430 networks. In the Jazz, Football and Karate networks, the Kendall τ values of the GIN algorithm were superior to the other algorithms. In the Polbooks and Reactome networks, the Kendall τ values of the KBKNR algorithm were highest. Only in the Ca-Astroph network, the Kendall τ values of the CC algorithm were highest. Consequently, the proposed GLI algorithm performed the best among all the algorithms on most networks.

Infection capability of the top 15 nodes

In order to validate the effectiveness of the GLI algorithm, we have calculated the infection ability of the top 15 nodes of the GLI and other algorithms in the SIR model. In the experiment, the infection probability α has been set to 0.01, and the recovery probability β has been set to one, the time step has been set from 1 to 30, and the number of iterations has been set as 1000.

As shown in Fig. 6, the number of infected nodes F(t) increased with the increasing time step t, and finally it reached a stable value at time step t = 10. This indicated that the top 15 influential nodes effectively infected other nodes in a short time. In the eight networks, namely the Blogs, Friendships, EmailEU32430, Polbooks, Karate, Protein, USAir2010 and Brightkite networks, the top 15 nodes of the GLI algorithm had the strongest infection ability. In the Ca-Astroph network, the infection ability of the top 15 nodes of GLI algorithm and DC algorithm were similar and better than other algorithms. In the Jazz and Reactome networks, the top 15 nodes of the RGLI algorithm and the PR algorithm had similar infection abilities, with distinct advantages over the GLI algorithm. But the GLI algorithm performed better than the other algorithms. In the football network, the effect of the GLI algorithm was general. Therefore, the top 15 nodes which selected by the GLI algorithm were stronger than other algorithms in the majority of networks. The result further demonstrated the effectiveness and accuracy of the proposed algorithm.

Time complexity analysis

The time complexity analysis was performed considering the procedure performed by the proposed GLI algorithm, including three stages. The temporal complexity analysis results are described below.

First, the network was stored in the logical form of an N $\times $ N matrix, and the computation degree needed to go through the other (n − 1) nodes, and the time complexity was O(n²). Further, the time complexity of calculating the K-shell value was O(n*logn). Next, the given value of the adjacent node was calculated, and this process involved the adjacent node; the time complexity of this process was O(n*(n − 1)) when the network was a complete network.

The total time complexity was calculated as the maximum of the above three time complexity values, and the time complexity of the proposed GLI algorithm was O(n²).

The source code is available online at: https://github.com/hhf602/GLI-Code.

Conclusion

This paper proposes an efficient and accurate algorithm for key node identification in a complex network named the GLI algorithm. The GLI algorithm first calculates the Ks value of a network node, which is expressed as a global influence. Then, the local influence is obtained considering the node degree, the adjacent node’s node degree value, and the adjacent node’s Ks value and introducing a similarity coefficient between the adjacent nodes. Finally, the node influence is calculated based on the global and local influence results. The proposed algorithm is verified by experiments. It is compared with the other related algorithms using the results of the SIR and SI models as the evaluation index. Based on the experimental results of the nine algorithms on ten networks, the proposed GLI method performs better than the other algorithms.

The main reason why the GLI algorithm achieves better results than the other algorithms is that in the proposed algorithm, the network nodes are ranked by the K-shell algorithm, and then the ranking result of the K-shell algorithm is refined according to the local influence of the nodes. This enhances the differentiation of node influence and solves the problem of hierarchical coarse-graining of the K-shell algorithm.

However, the proposed algorithm considers only information of adjacent nodes but not the information of secondary or tertiary neighbors of a node when calculating the local information influence contribution. Therefore, multiple influencing factors could be considered in further in-depth research.

Data availability

The datasets analysed during the current study are available in the ScienceDB repository, https://www.scidb.cn/anonymous/WmpxbUF6.

References

Kang, B., Kim, D. & Choo, H. Internet of everything: A large-scale autonomic IoT gateway. IEEE Trans. Multi-Scale Comp. Syst. 3, 206–214. https://doi.org/10.1109/TMSCS.2017.2705683 (2017).
Article Google Scholar
Luo, Y. Research of Cognitive Radio AD HOC Networks Based on Scale-Free Theory. Beijing University of Posts and Telecommunications (2021) https://doi.org/10.26969/d.cnki.gbydu.2021.001044.
Xiong, C. Research on Heterogeneous Wireless Sensor Networks with High Quality of Service Based on Complex Networks. East China Normal University (2022) https://doi.org/10.27149/d.cnki.ghdsu.2022.002057.
Yang, L., Zhao, Z. & Zhu, G. Risk trend and Countermeasures of online financial fraud. Modern CommercialBank Guide 4, 3 (2015).
Google Scholar
Weng, J., Lim, E. P. & Jiang, J. Twitter rank: Finding topic-sensitive influential twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining. 261–270 (ACM Press, 2010). https://doi.org/10.1145/1718487.1718520.
Yang, L., Zhao, C. & Chen, X. Research on credit risk mitigation mechanisms of peer-to-peer lending based on social network. Chin. J. Manag. Sci. 1, 47–56 (2018).
Google Scholar
Yakov, B. H. An algorithm for failure location in a complex network. Nucl. Sci. Eng. 75, 191–199 (1980).
Article Google Scholar
Zhu, C. & Bao, D. Fraud risk identification based on complex network. PEAK DATA SCIENCE.CG WORLD. 10(3), 178–181 (2021).
Wang, Y. & Gao, L. Social circle-based algorithm for friend recommendation in online social networks. Chin. J. Comput. 37(04), 801–808. https://doi.org/10.19551/j.cnki.issn1672-9129.2021.03.177 (2021).
Article CAS Google Scholar
Wu, Y. Research on Key Node Identification and Invulnerability of Complex Networks Based on Cascading Failure. North China Electric Power University, Beijing (2021) https://doi.org/10.27140/d.cnki.ghbbu.2021.000288.
Peng, Y. Key nodes Identification of Wireless Sensor Network Based on Complex Network Theory. Southwest University. https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201502&filename=1015335646.nh (2015).
Wang, G. Financial Fraud Community Detection Algorithm Based on Probabilistic Model. University of ESTC. (2019) https://doi.org/10.27005/d.cnki.gdzku.2021.005207.
Wang, W. Research on Fraud Detection Based on Edge Attribute Networks. (Tianjin University, Tianjin, 2019). https://doi.org/10.27356/d.cnki.gtjdu.2019.004088.
Wang, L. Research om Consumers Information Searching Behavior from Network Perspective Based on Risk and Fashion (Doctoral dissertation). (Shandong University, JiNan) https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CDFDLAST2017&filename=1017079770.nh (2017)
Ghalmane, Z., Cherifi, C., Cherifi, H. & Hassouni, M. E. Centrality in complex networks with overlapping community structure. Sci. Rep. 9(1), 1–29 (2019).
Article CAS Google Scholar
Blöcker, C., Juan, C. N. & Martin, R. Map equation centrality: community-aware centrality based on the map equation. Appl. Netw. Sci. 7(1), 1–24. https://doi.org/10.1007/s41109-022-00477-9 (2022).
Article Google Scholar
Rajeh, S., Savonnet, M., Leclercq, E. & Cherifi, H. Comparing community-aware centrality measures in online social networks. Qual. Quant. https://doi.org/10.48550/arXiv.2202.00515 (2022).
Article MATH Google Scholar
Ghalmane, Z., El Hassouni, M., Cherifi, C. & Cherifi, H. Centrality in modular networks. EPJ Data Sci. 8(1), 15 (2019).
Article Google Scholar
Meghanathan, N. Neighborhood-based bridge node centrality tuple for complex network analysis. Appl. Netw. Sci. 6(1), 1–36. https://doi.org/10.1007/s41109-021-00388-1 (2021).
Article Google Scholar
Ghalmane, Z., El Hassouni, M., & Cherifi, H. Betweenness centrality for networks with non-overlapping community structure. In 2018 IEEE Workshop on Complexity in Engineering (COMPENG), Vol. 11, 1–5 (2018)
Rajeh, S. & Hocine, C. Ranking influential nodes in complex networks with community structure. PLoS ONE 17(8), e0273610. https://doi.org/10.1371/journal.pone.0273610 (2022).
Article CAS Google Scholar
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2(1), 113–120. https://doi.org/10.1080/0022250x.1972.9989806 (1972).
Article Google Scholar
Wang, F. & Hu, H. Coverage hole detection method of wireless sensor network based on clustering algorithm. Measurement https://doi.org/10.1016/j.measurement.2021.109449 (2021).
Article Google Scholar
Ren, X. L. & Lü, L. Y. Review of ranking nodes in complex networks. Chin. Sci. Bull. 59, 1175–1197. https://doi.org/10.1360/972013-1280 (2014).
Article Google Scholar
Lü, L. Y. et al. Vital nodes identification in complex networks. Phys. Rep. 650, 1–63. https://doi.org/10.1016/j.physrep.2016.06.007 (2016).
Article ADS MathSciNet Google Scholar
Gert, S. The centrality index of a graph. Psychometrika 4, 581–603. https://doi.org/10.1007/BF02289527 (1966).
Article MathSciNet Google Scholar
Freeman, L. C. Centrality in social networks conceptual clarification. Soc. Netw. 3, 215–239. https://doi.org/10.1016/0378-8733(78)90021-7 (1978).
Article Google Scholar
Kitsak, M. et al. Identification of Influential spreaders in complex networks. Nat. Phys. 11, 888–893. https://doi.org/10.1038/nphys1746 (2010).
Article CAS Google Scholar
Ullah, A., Wang, B., Sheng, J., Long, J. & Khan, N. Identification of influential nodes via effective distance-based centrality mechanism in complex networks. Complexity https://doi.org/10.1155/2021/8403738 (2010).
Article Google Scholar
Qiu, L., Zhang, J. & Tian, X. Ranking influential nodes in complex networks based on local and global structures. Appl. Intell. 26, 1–14. https://doi.org/10.1007/s10489-020-02132-1 (2021).
Article Google Scholar
Ibnoulouafi, A., Haziti, M. E. & Cherifi, H. M-centrality: identifying key nodes based on global position and local degree variation. J. Stat. Mech. Theory Exp. 7, 073407. https://doi.org/10.1088/1742-5468/aace08 (2018).
Article MathSciNet MATH Google Scholar
Brin, S. et al. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998).
Article Google Scholar
Yu, Z., Shao, J., Yang, Q. & Sun, Z. ProfitLeader: Identifying leaders in networks with profit capacity. World Wide Web 22(2), 533–553. https://doi.org/10.1007/s11280-018-0537-6 (2019).
Article Google Scholar
Zhao, J., Wang, Y. & Deng, Y. Identifying influential nodes in complex networks from global perspective. Chaos, Solitons Fract. 133, 109637. https://doi.org/10.1016/j.chaos.2020.109637 (2022).
Article MathSciNet MATH Google Scholar
Xie, L., Sun, H., Yang, Y. & Zhang, L. Key node recognition in complex networks based on the K-shell method. J. Tsinghua Univ. (Sci. Technol.) 62, 849–861. https://doi.org/10.16511/j.cnki.qhdxxb.2022.25.041 (2022).
Article Google Scholar
Gupta, M. & Mishra, R. Spreading the information in complex networks: Identifying a set of top-N influential nodes using network structure. Decis. Support Syst. 149, 113608 (2021).
Article Google Scholar
Sheng, J., Dai, J. & Wang, B. Identifying influential nodes in complex networks based on global and local structure. Phys. A Stat. Mech. Appl. https://doi.org/10.1016/j.physa.2011.09.017 (2020).
Article Google Scholar
Chen, D. B., Lü, L. Y., Shang, M. & Zhou, T. Identifying influential nodes in complex networks. Phys. A Stat. Mech. Appl. 4(391), 1777–1787. https://doi.org/10.1016/j.physa.2011.09.017 (2012).
Article Google Scholar
Meng, C. X., Li, N. N. & Zhang, Y. Research on community detection algorithm based on complex network. Comput. Technol. Dev. 1, 82–86. https://doi.org/10.3969/j.issn.1673-629X.2020.01.015 (2020).
Article Google Scholar
Tuljapurkar, S. Infectious diseases of GLImans: Dynamics and control. Science 254(5031), 591–593 (1991).
Article CAS Google Scholar
Yang, X. & Xiao, F. An improved gravity model to identify influential nodes in complex networks based on k-shell method. Knowledge-Based Syst. 227, 107198. https://doi.org/10.1016/j.knosys.2021.107198 (2021).
Article Google Scholar
Kendall, M. G. The treatment of ties in ranking problems. Biometrika 33(3), 39–251. https://doi.org/10.2307/2332303 (1945).
Article MathSciNet MATH Google Scholar
Dai, J. Y. et al. Identifying influential nodes in complex networks based on local neighbor contribution. IEEE Access 7, 131719–131731. https://doi.org/10.1016/j.physa.2011.09.017 (2019).
Article Google Scholar
Sun, Z. J. et al. Identifying influential nodes in complex networks based on weighted formal concept analysis. IEEE Access https://doi.org/10.1109/access.2017.2679038 (2017).
Article Google Scholar
Kunegis, J. KONECT: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web (WWW '13 Companion). 1343–1350 (Association for Computing Machinery, New York, 2013) https://doi.org/10.1145/2487788.2488173.
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2. https://doi.org/10.1145/1217299.1217301 (2007).
Article Google Scholar
Shao, J., Han, Z., Yang, Q., & Zhou, T. Community detection based on distance dynamics. In Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. 1075–1084 (ACM, 2015) https://doi.org/10.1145/2783258.2783301.
Shang, Q. Y., Deng, Y. & Cheng, K. H. Identifying influential nodes in complex networks: Effective distance gravity model. Inf. Sci. 577, 162–179. https://doi.org/10.1016/J.INS.2021.01.053 (2021).
Article MathSciNet Google Scholar
Gleiser, P. M. & Danon, L. Community structure in jazz. Adv. Complex Syst. 6, 565–573. https://doi.org/10.1142/S0219525903001067 (2003).
Article Google Scholar
Nie, T., Guo, Z., Zhao, K. & Lu, Z. M. Using mapping entropy to identify node centrality in complexnetworks. Phys. A 453, 290–297. https://doi.org/10.1016/j.physa.2016.02.009 (2016).
Article Google Scholar
Liu, P., Li, L., Fang, S. & Yao, Y. Identifying influential nodes in social networks: a voting approach. Chaos Solitons Fract. 152(7415), 111309. https://doi.org/10.1016/J.CHAOS.2021.111309 (2021).
Article MATH Google Scholar
Chamberlin, S. R., Blucher, A. & Wu, G. Natural product target network reveals potential for cancer combination therapies. Front. Pharmacol. 10, 557. https://doi.org/10.3389/fphar.2019.00557 (2019).
Article CAS Google Scholar
Boujlaleb, L., Idarrou, A., & Mammass, D. Feature selection for community evolution prediction in location-based social network: Gowalla and Brightkite. In International conference on smart Information and communication Technologies, 404-412 (Springer, Cham, 2019) https://doi.org/10.1007/978-3-030-53187-4_44.
Rajeh, S., Savonnet, M., Leclercq, E. & Cherifi, H. Interplay between hierarchy and centrality in complex networks. IEEE Access 8, 129717–129742 (2020).
Article MATH Google Scholar
Pu, J., Chen, X., Wei, D., Liu, Q. & Deng, Y. Identifying influential nodes based on local dimension. EPL (Europhys. Lett.) 107(1), 10010 (2014).
Article ADS Google Scholar
Zhong, S., Zhang, H. T. & Deng, Y. Identification of influential nodes in complex networks: A local degree dimension approach. Inf. Sci. 610, 994–1009 (2022).
Article Google Scholar
Wen, T. & Deng, Y. Identification of influencers in complex networks by local information dimensionality. Inf. Sci. 512, 549–562 (2020).
Article Google Scholar

Download references

Acknowledgements

The research presented in this study was supported by the Scientific and Technological Project in Henan Province of China (Grant Nos. 222102210218, 22102210160, 222102210129 ), The Key Scientific Research Projects of Colleges and Universities in Henan Province of China (Grant No. 23A520051), and the Young Backbone Teachers Training Program of Higher Education Institutions in Henan Province (Grant No. 2019GGJS235).

Author information

Authors and Affiliations

Pingdingshan University, Pingdingshan, 467000, China
Haifeng Hu, Zejun Sun, Feifei Wang, Liwen Zhang & Guan Wang

Authors

Haifeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zejun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Feifei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.H.: conceptualization, methodology, the original draft writing and preparation, funding acquisition; F.W.: formal analysis, investigation, validation; Z.S.: visualization, supervision, project administration; L.Z.: implementation of the computer code and supporting algorithms. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Zejun Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, H., Sun, Z., Wang, F. et al. Exploring influential nodes using global and local information. Sci Rep 12, 22506 (2022). https://doi.org/10.1038/s41598-022-26984-4

Download citation

Received: 24 October 2022
Accepted: 22 December 2022
Published: 29 December 2022
DOI: https://doi.org/10.1038/s41598-022-26984-4

This article is cited by

Excavating important nodes in complex networks based on the heat conduction model
- Haifeng Hu
- Junhui Zheng
- Liugen Wang
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Dynamic identification of important nodes in complex networks based on the KPDN–INCC method

Influential nodes identification using network local structural properties

Integrating local and global information to identify influential nodes in complex networks

Introduction

Basic ideas of proposed GLI method

Contribution of proposed GLI

Related work

Proposed GLI

Definition

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Proposed model evaluation

SIR model

SI model

Kendall coefficient

Experimental results

Algorithm process

Example description

Data description

Experimental result

Experimental results comparison with the SIR model

Experimental results comparison with the SI model

Infection capability of the top 15 nodes

Time complexity analysis

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Excavating important nodes in complex networks based on the heat conduction model

Comments

Search

Quick links