Introduction

In recent years, human production and life have become increasingly inseparable from different types of networks1. In sensor networks, a sensor represents the main device to obtain desired information. Then, through a self-group sensor network, the perceived information can be transmitted to a server for further data processing2,3. Financial networks have brought great convenience to human life. For instance, when people shop on the Internet, they can enjoy the convenience brought by the consumer financial network4. In addition, in social networks, people can make new friends, thus constantly expanding their personal connections and achieving better communication with other people5. Currently, many phenomena can be described using complex networks, such as social activities, smart sensor applications, consumer finance, and consumer finance services6. Complex networks are associated with countless nodes, so accurately exploring and identifying their key nodes can solve many application problems, including fault node positioning, prevention and control of fraud risk, and friend recommendations7,8,9. Therefore, how to design an efficient algorithm to identify key nodes of a complex network accurately is an urgent problem. In an intelligent sensor network, accurately identifying and effectively protecting important sensors can avoid security attacks10. In this way, when a fault occurs in a network, the fault position can be located fast and the fault can be timely addressed11. Further, in a consumer finance business network, the risk of fraud of credit customers can be judged effectively12, financial institutions can be helped to filter inferior users, and the first risk control link can be established to delete fraud personnel13. In addition, when recommending friends in a community network, the identification of key nodes can help users to recommend friends that they have not added yet but may know and send the result to users as a "friend recommendation," which can improve user loyalty14.

The most popular approach to explore influential nodes in complex networks has been to used centrality measures15,16,17,18,19,20,21. From the perspective of global and local information, typical methods based on local information include methods considering the degree centrality (DC)22 and methods considering the eigenvector centrality (EC)23. The DC-based methods calculate the node influence by analyzing the number of edges connected by nodes, which has the advantages of high simplicity and fast calculation speed. However, these methods consider only local information and have low accuracy24. The EC-based method calculates the node influence considering both the number and the information of adjacent nodes. Namely, when the number of adjacent nodes is very large, their influence will be very strong. But The EC method score much prefers to concentrate in a few nodes under common conditions, making it hard to distinguish among the nodes25. In contrast, methods based on global information include the closeness centrality (CC)26, betweenness centrality (BC)27, and K-shell decomposition methods28. The BC and CC methods have high accuracy in identifying critical nodes, but their long computational time limits their application to large networks29. The K-shell decomposition method calculates the node influence based on the node location, but the hierarchical results are coarse-grained, resulting in a low node discrimination30. The aforementioned algorithms consider either global or local information of nodes, which causes certain limitations in practical applications.

The node influence information obtained based on multiple datasets is more accurate than that obtained using only a single attribute31. Therefore, this study uses global information to calculate the global influence of nodes but local information to calculate their own influence and the influence of their adjacent nodes. The nodes with the same global influence are distinguished to improve the differentiation effect of node influence.

Basic ideas of proposed GLI method

The factors should be considered comprehensively from both global and local aspects. For instance, an important project in the real world can usually be decomposed into multiple subprojects, so a team responsible for completing the project needs to be divided into groups, each of which will be responsible for one subproject. Such a team can be regarded as a complex network where team members denote nodes, and the influence of each member depends on his position in the network. The higher the position is, the greater the number of resources will be; for instance, the project team leader is more influential than the group leader, and the group leader is more influential than the ordinary members. At the same time, the self-capability and the help provided by the team members are also important factors that define the influence. If members A and B have many common neighbors, member B is considered to be close to member A, members B and A have a great similarity, and member B has a greater influence on member A. The more similar the other members are with a particular member, the greater their influence on the member is. Accordingly, the proposed GLI method considers the contribution of both global and local information. First, global information is represented through network hierarchy obtained by the K-shell method; next, local information is represented by the degree of self and adjacent nodes, and adjacent nodes’ Ks values. In addition, the similarity coefficient is introduced, and the higher the similarity between the nodes is, the greater the contribution provided by the adjacent nodes will be.

Contribution of proposed GLI

This study provides an innovative research perspective for the identification of key nodes in a complex network. The proposed GLI algorithm’s innovation is mainly reflected in three aspects, which are as follows:

  1. 1.

    An accurate key node identification method, which considers the influence of nodes from both global and local aspects, is proposed. The shortcomings of the existing coarse-grained methods based on global information are addressed, and the accuracy of the key node identification is improved;

  2. 2.

    The proposed GLI method uses the similarity coefficient between nodes in a network, which enhances the differentiation ability of adjacent gives values and improves the identification ability of key nodes;

  3. 3.

    The proposed GLI method considers both global and local information, which makes it highly practical and suitable for large and complex networks.

The rest of this article is organized as follows. Section “Related work” describes related work. Section “Proposed GLI” describes the proposed GLI method and presents its design idea and specific working process. Section “Experimental Results” compares the proposed GLI method and classical algorithm on different datasets and analyzes the comparison results. Finally, Section “Conclusion” summarizes the main contributions of the study.

Related work

Many factors influence the accuracy of key node identification in complex networks. In the following, a very brief survey of the methods relevant to the proposed GLI is provided. The measures considered in these methods are introduced from the global and local perspectives.

  1. (1)

    Local centrality

    • DC: This is a local centrality measure related to the number of edges connected by nodes.

    • EC: This is a local centrality measure related to a node’s degree and influence of its neighbors.

    • PageRank32(PR): Similar to the EC, this measure reflects the importance of a node, which is determined by both the quantity and the quality of adjacent nodes. This is a local centrality measure using the node degree and the PR value of the neighboring nodes.

    • ProfitLeader33(PL): The ProfitLeader algorithm computes the profit a node provides to the other nodes, where the importance of the node is related to the profit. This is a local centrality measure using node profit and sharing probability to its neighbors.

  2. (2)

    Global centrality

    • BC: Global centrality measure uses the number of shortest paths through the node.

    • CC: Global centrality measure denotes the relative shortest path between the pairs of nodes.

    • K-shell: In a network, nodes are decomposed in a layer-by-layer manner according to their degree values, and the node importance is evaluated from the level where the node is located in. This is a global centrality measure based on the node degree.

  3. (3)

    Combination of global and local centrality

    • GIN34: This is a combination of global and local centrality obtained based on the distance, degree, and all other node-related information.

    • KBKNR35: The KBKNR algorithm reflects the influence of adjacent nodes and secondary-adjacent nodes through the influence coefficient, which is then used to distinguish the nodes based on their importance. This is a combination of global and local centrality using the k-shell, distance, and degree values.

    • RLGI36: This is a combination of global and local centrality obtained based on the k-shell and degree.

    • GLS37: Combination of global and local centrality based on the k-shell, distance, degree, and eigenvector centrality.

Proposed GLI

Definition

The research objectives of this paper include undirected and unweighted networks, which can be expressed as Graph = (Vertex, Edge). This section introduces some of the definitions. The following definitions are related to an undirected, unweighted network.

Definition 1

Node degree38: The node degree represents the number of edges that a particular node joins. Assume, a network is defined as A = (aij) N × N; then, the node degree d(vi) can be expressed as follows:

$$ d\left( {vi} \right) = \sum\limits_{j = 1}^{N} {aij} = \sum\limits_{j = 1}^{N} {aji} $$
(1)
$$ \max D = \max \left( {d\left( {vi} \right)} \right) $$
(2)

where maxD indicates the maximal node degree.

Definition 2

K-shell value of a node: The K-shell value is obtained by the K-shell algorithm, and it is calculated by:

$$ Ks\left( {vi} \right) = K - shell\left( G \right) $$
(3)

Definition 3

Global node influence: The K-shell algorithm can calculate the global position value of a node vi, which measures the global influence of vi, Global(vi). The specific calculation formula is as follows:

$$ Global\left( {vi} \right) = Ks\left( {vi} \right) $$
(4)

Definition 4

Adjacent node similarity: A given value of an adjacent node represents the factor affecting the importance of a node, and there is a relationship between the given value size and the similarity between the nodes. Adjacent nodes with different local structures may enhance the differentiation effect of a given influence of adjacent nodes. In GLI method, the higher the node similarity is, the higher the proportion of influence the node can provide, so the proposed algorithm adopts the Jaccard similarity coefficient39J(vi,vj), which reflects the similarity of adjacent nodes, and it is calculated by:

$$ Jacc\left( {vi,vj} \right) = \left| {\frac{{n\left( {vi} \right) \cap n\left( {vj} \right)}}{{n\left( {vi} \right) \cup n\left( {vj} \right)}}} \right| $$
(5)

where n(vi) represents the set of nodes that have common edges with a node \(vi\) and contains this node, n(vj) denotes the set of nodes that have common edges with node \(vj\) and contains node vj.

Definition 5

Local influence: Local influence considers two factors: personal influence of a node and the contribution of its adjacent nodes.

Assume that P(vi) denotes the personal influence of a node; then, P(vi) is determined by d(vi) and calculated by:

$$ P\left( {vi} \right) = d\left( {vi} \right) $$
(6)

Assume that vj is an adjacent node of node vi; then, the given value П(vj) of node vj can be obtained by comprehensively considering d(vi), Ks(vj), and J(vi,vj), which is expressed by:

$$ \prod {\left( {vj} \right)} = d\left( {vj} \right) \times Jacc\left( {vi,vj} \right) + Ks\left( {vj} \right) $$
(7)

The Sum(vi) represents the sum of the given influence for all the adjacent nodes, Sum(vi) calculation formula is as follows:

$$ Sum\left( {vi} \right) = \sum\limits_{{vj \in \Gamma \left( {vi} \right)}} {\prod {\left( {vj} \right)} } $$
(8)

Further, assume that in the considered network, node vi has the most maxD adjacent nodes to provide the given influence. Then, Sum(vi) is divided by maxD to normalize, and the given influence of node vi on its adjacent node is obtained by:

$$ N\left( {vi} \right) = {{Sum\left( {vi} \right)} \mathord{\left/ {\vphantom {{Sum\left( {vi} \right)} {\max D}}} \right. \kern-0pt} {\max D}} $$
(9)

where Local(vi) indicates local influence, and it is defined by:

$$ Local\left( {vi} \right) = P\left( {vi} \right) + N\left( {vi} \right) $$
(10)

Definition 6

Node influence: The node influence I(vi) depends on Global(vi) and Local(vi), and it is calculated as follows:

$$ I\left( {vi} \right) = Local\left( {vi} \right) + Global\left( {vi} \right). $$
(11)

Proposed model evaluation

To verify the effect of the proposed algorithm, two evaluation models, the SIR and SI models, were selected. The reliability of the comparison analysis was validated by experimental results.

SIR model

The SIR model40 is a mathematical model describing disease transmission, which is a general standard for evaluating the accuracy of node identification.

In the SIR model, network nodes are divided into three categories as follows:

  • Susceptible: A susceptible node is a node that is not sick but lacks the immune ability and is vulnerable to the infection after contact with a sick node.

  • Infection: A sick node is an infected node that can infect a susceptible node.

  • Removed: A node removed from a network, recovered (with immunity) or dead; these nodes are no longer involved in the infection process.

Next, assume that at time t, the total node number All(t) is unchanged, and nodes can be in one of the three states: susceptible to infection point Susceptible(t), sick node Infection(t), or removed from node Removed(t), and it holds that All(t) = Susceptible(t) + Infection(t) + Removed(t).

At time t, the number of infected nodes is α * Susceptible(t) * Infection(t).

After the time interval of Δt, changes in the numbers of susceptible, infection, and removed nodes are respectively as follows:

  • Susceptible: Susceptible(Δt) =  − α * Susceptible(t) * Infection(t).

  • Infection: Infection(Δt) = α * Susceptible(t) * Infection(t) − β * Infection(t).

  • Removed: Removed(Δt) = β * Infection( t ).

SI model

Similar to the SIR model, the SI model41 is the simplest disease transmission model. In the SI model, network nodes are divided into two groups: susceptible nodes and infection nodes. At time t, a susceptible node may become an infected node with a probability of α, and this process is irreversible.

In the experiment conducted in this study, network nodes were treated as sick nodes, and the number of infected nodes was calculated using the SIR and SI models. The calculation process included multiple iterations to illustrate the infection influence of nodes. The results of the network obtained by the above two models were used as an evaluation criterion. The results of the proposed GLI algorithm and several related algorithms were compared.

Kendall coefficient

The Kendall coefficient τ is used in this study to determine the similarity between the ranking results of the proposed algorithm and those of the SIR model42 on the same network. Assume nodes vi and vj are selected by the GLI algorithm to obtain the values GLI(vi) and GLI(vj). Then, node vi and vj are processed by the SIR model, and values SIR(vi) and SIR(vj) are obtained. If GLI(vi) > GLI(vj) and SIR(vi) > SIR(vj), or GLI(vi) < GLI(vj) and SIR(vi) < SIR(vj), then the resulting values are considered consistent, and \(\tau\) = 1. If GLI(vi) > GLI(vj) and SIR(vi) < SIR(vj), or GLI(vi) < GLI(vj) and SIR(vi) > SIR(vj), then the resulting values are considered inconsistent, and \(\tau\) = -1. The specific calculation formula is as follows:

$$ \tau (X,Y) = \frac{{2 * \left( {n_{c} - n_{d} } \right)}}{{n * \left( {n - 1} \right)}} $$
(12)

where X and Y represent the evaluated object, nc is the number of consistencies in two sequences, and nd indicates the number of inconsistencies in the two sequences.

Experimental results

Algorithm process

figure a

For detailed procedures, please refer to Supplementary Information.

Example description

In Fig. 1, the proposed algorithm is described in the example of the calculation process of the influence of node v1:

Figure 1
figure 1

A network diagram, where yellow nodes indicate the sample nodes. The K-shell algorithm is used to divide the network into three layers, and node v1 is in the third layer.

  1. (1)

    Node degree

According to the idea of GLI algorithm, the node degree of node v1 and its adjacent nodes is calculated by Eq. (1), and the results are shown in Table 1. The maximum degree is maxD = 6, and it is calculated by Eq. (2).

Table 1 Degree of v1 and adjacent nodes.
  1. (2)

    Ks value of nodes

According to Eq. (3), after the decomposition by the K-shell algorithm, the Ks value of node v1 and its adjacent nodes is obtained, and the results are shown in Table 2.

Table 2 Ks of v1 and adjacent nodes.
  1. (3)

    Global influence of node

According to Eq. (4), it is obtained that: Global (v1) = Ks(v1) = 3.

  1. (4)

    Similarity coefficient of nodes

According to Eq. (5), the similarity coefficient of node v1 and its neighbors, denoted by J(v1,vj), is calculated, and the obtained results are shown in Table 3.

Table 3 The similarity coefficient values of node v1 and its adjacent nodes.
  1. (5)

    Local influence

The given influence of the adjacent nodes of node v1 is calculated by Eq. (7), and the results are shown in Table 4.

Table 4 Local influence values of node v1 and its adjacent nodes.

Second, according to Eq. (6), it holds that: P(v1) = d(v1) = 6.

Finally, according to Eq. (10), we have Local(v1) = 10.304894.

  1. (6)

    Node influence

According to Eq. (11), it can be obtained that: I(v1) = Local(v1) + Global(v1) = 13.304894.

Following the above-presented steps, the influence values of all nodes in Fig. 1 are calculated, as shown in Table 5.

Table 5 Influence of the nodes in the sample network presented in Fig. 1.

Data description

Twelve real representative networks were selected to evaluate the proposed GLI algorithm, and they are as follows:

  1. 1.

    Blogs network43: This is a network of hyperlinks between the homepage of the 2004 US election blog, which includes 1,224 nodes and 19,052 edges.

  2. 2.

    Ca-Astroph network44: This is a collaborative network of scientific collaboration relationships between the astrophysical category author papers; it includes 18,771 nodes and 198,050 edges.

  3. 3.

    Friendship network45: This network represents the connections between users on hamsterster.com and has 1,858 nodes and 12,534 edges.

  4. 4.

    Email EU32430 network46: This network expresses the communication between email users and consists of 32,430 nodes and 54,397 edges.

  5. 5.

    Polbooks network47: A network of online book sales built on the relationships between American political book buyers, with 105 nodes and 441 edges.

  6. 6.

    Jazz network48: A collaboration network of a group of jazz musicians, including 198 nodes and 2,742 edges.

  7. 7.

    Football network33: This network represents a college football league and consists of 115 nodes and 616 edges.

  8. 8.

    Karate network49: This network is a network of Karate Club members, with 34 nodes and 78 edges.

  9. 9.

    Protein network50: This network consists of proteins that interact with each other, and it includes 1,870 nodes and 2,277 edges.

  10. 10.

    USAir2010 network51: The 2010 US network contains 1,574 nodes and 17,215 edges.

  11. 11.

    Reactome network52: This network consists of proteins that interact with each other, and it includes 6,327 nodes and 147,547 edges.

  12. 12.

    Brightkite network53: This network is a social networking, and it includes 58,228 nodes and 214,078 edges.

The relevant property statistics of the experimental datasets are given in Table 6.

Table 6 Characteristic statistics for ten real networks used in the comparison experiment.

Experimental result

To evaluate the applicability of the GLI algorithm, nine typical algorithms were implemented by Python, and the experiments were performed on ten datasets of different sizes. The experimental hardware platform included a Lenovo desktop computer, a CPU: i5-10,100, a memory of 32 GB; the software environment was Spyder (Python 3.7.3).

Experimental results comparison with the SIR model

  1. (1)

    Kendall value analysis

The experimental networks and their characteristic information are shown in Table 6. The range of α in the SIR model was [0.01, 0.1], and the step length was 0.01; this value was selected to avoid the network infection being too slow or too fast. First, the Kendall value results of the BC, CC, DC, EC, k-shell, PR, GIN, PL, KBKNR, RLGI, GLS, and GLI algorithms on different datasets were compared with the infection results of the SIR model. Then, the Kendall values of the proposed GLI algorithm and the other algorithms were compared. The comparison results are shown in Fig. 2.

Figure 2
figure 2

Kendall τ values of different algorithms compared with the SIR model result on 12 networks. Infection probability α varied from 0.01 to 0.1.

The Kendall τ values of the GLI was the highest at all infection probabilities in the Protein network. When calculating the influence coefficient, the KBKNR algorithm takes the number of neighbor nodes as the divisor. While in the protein network, for nodes with a presence degree of zero, the algorithm cannot run correctly. In the six infection networks, the Blogs, Ca-Astroph, Friendships, EmailEU32430, Reactome and USAir2010 networks, the GLI algorithm performed better than the other algorithms. Among the above seven networks, the values of the maximum degree is relatively large, the values of the average degree was relatively small, and the distinction degree of the nodes’ degree values was large. Therefore, GLI has a better performance in these networks.

In the Brightkite network, the GLI, GIN, and KBKNR algorithms were superior to the other algorithms. In the Polbooks and Karate networks, the Kendall τ value of the GLS algorithm was higher than that of the GLI algorithm, but the GLI algorithm performed better than the other algorithms. In Brightkite, Polbooks and Karate networks, the distinction degree of the nodes’ degree values was small. Therefore, the advantages of the GLI algorithm are not obvious.

There are strong relationships between the hierarchical measure, the centrality measure, and the topological properties of the network54. In Jazz and football networks, the connections between local nodes are relatively dense and have an obvious community structure. The distinction between K-shell value and degree value is not high. Therefore, the GLI algorithm does not work well in these two networks.

  1. (2)

    Optimal algorithm under different infection probabilities

As illustrated in Fig. 3, the GLI algorithm achieved a maximum value of 51.67% at different infection probabilities on the 12 networks. The maximum results of the other algorithms were as follows: 16.67% for the GLS algorithm, 10% for the KBKNR algorithm, 12.5% for the EC algorithm, 8.33% for the GIN algorithm, and 0.83% for the PL algorithm. In addition, the GLI algorithm performed well on all networks.

Figure 3
figure 3

The maximum Kendall τ value results of the algorithms for different infection probabilities; the abscissa represents the network datasets, the ordinate indicates the infection probability α. Data presented in Fig. 3 correspond to the algorithm of the maximum Kendall τ values.

  1. (3)

    Top-15 important nodes in different networks

Without a loss of generality, in this experiment, the SIR model infection probability α was set to 0.02, and the recovery probability was set to one. First, the results of different algorithms on the network datasets were obtained and compared with the SIR model. Then, the 15 most influential nodes were extracted from the results. Finally, the algorithms’ performances were analyzed by ranking the nodes. The first 15 nodes of the Karate, Jazz, and Ca-Astroph networks in the large, medium, and small three-type networks were selected and illustrated.

As presented in Table 7, the GLI, EC, GLS and PR algorithms achieved identical results for 14 nodes out of the first 15 nodes of the SIR model. The GLI, GLS and PR algorithms were rank-aligned with the top-three nodes of the SIR model, achieving the best results among all algorithms. The EC algorithm ranked the first two nodes of the SIR correctly and was the second-best performing algorithm, following the GLI, GLS and PR algorithms.

Table 7 Top-15 nodes in the Karate network obtained by different algorithms.

Table 8 shows that the top-15 nodes of the Jazz network were ranked, and the PL algorithms achieved the best results. Fourteen nodes out of 15 nodes were the same as those of the SIR model, and the ranking of the first four nodes was completely consistent with the SIR model. The GLI algorithm ranked 12 nodes out of the 15 nodes the same as in the SIR model, and the first four nodes were identical to the first four nodes of the SIR model. Although the GLI algorithm performed poorly compared with the PL algorithm, it achieved better results than the other algorithms, which indicated that the GLI algorithm was effective.

Table 8 Top-15 nodes in the Jazz network obtained by different algorithms.

As shown in Table 9, for the first 15 nodes of the Ca-Astroph network, 13 nodes of the GLI, DC, and GIN algorithms were the same as those in the SIR model, and the importance of the first 15 nodes was basically the same. However, the GLI and GIN algorithms had the same two nodes as the SIR, GLI, and GIN algorithms, which achieved the best results and were followed by the DC algorithm. For the PR and CC algorithms, 12 nodes and 11 nodes out of 15 nodes were identical to those in the SIR model, respectively. For the Ca-Astroph network, the worst-performing algorithms were the K-shell and KBKNR, having only one node identical to the SIR model nodes.

Table 9 Top-15 nodes in the Ca-Astroph network obtained by different algorithms.

Consequently, different algorithms had different advantages for different networks. However, the proposed GLI algorithm performed generally the best among all algorithms on the above-presented three networks, having the most obvious advantages.

  1. (4)

    Node ranking comparison of 11 algorithms

The above section analyzes the results of the top-15 key nodes identified by different algorithms and the SIR model in the Karate, Jazz, and Ca-Astroph networks. This section analyzes the sorting results of all nodes in each of the twelve networks. First, the infection values of nodes were calculated in the SIR model. Next, the nodes were arranged in descending order according to their importance values obtained by each algorithm. Then, the sequence of infection values was obtained from the node ranking results. It should be noted that if the ranking results of the algorithm were consistent with the results of the SIR model, a curve with a smooth downward trend from left to right would be formed. The results of a single node denoted as a seed node according to its infection value obtained by different algorithms are presented in Fig. 4, where the abscissa represents the number of infected nodes in the network obtained by each of the algorithms, and the ordinate represents the number of nodes infected and recovered at time t.

Figure 4
figure 4

Function F(t) indicates the number of infected nodes in the network at time t. For the purpose of reliability of the results, the average result of many iterations is presented as the infection node number value. Due to the large data size of the Brightkite, EmailEU32430 and Ca-Astroph networks, for these networks, the number of iterations was set to 100, and for the other networks, the number of iterations was set to 1000.

In Fig. 4, the data of the Polbooks, Jazz, Football, and Karate networks, which were small networks, are displayed on the linear scale; for the remaining eight networks, which had a large number of nodes, the data are displayed on the logarithm scale, focusing on the most influential nodes. As shown in Fig. 4, for the Blogs network, the result of the GLI algorithm showed an overall smooth decreasing trend, with the least number of peaks among all the algorithms. For the Ca-Astroph, Friendships, Brightkite, Reactome and USAir2010 networks, the results of the GLI algorithm had a few peaks, indicating that individual nodes were biased, but the proposed GLS algorithm’s results had the best effect among all the algorithms. For the EmailEU32430 network, the GLI, GLS, KBKNR, and K-shell algorithms performed well, but the curve decline of the results of the KBKNR and K-shell algorithms was reduced, the proposed GLS algorithm’s results fluctuated less and had the best effect among all the algorithms. Further, for the Polbooks network, the proposed GLI algorithm’s results fluctuated less and had the best effect among all the algorithms. For the Jazz network, the right part of the curve formed by the GLI algorithm had the least fluctuation and the best effect among all the algorithms. However, for the protein network, the KBKNR algorithm could not run, so its curve is not shown in Fig. 4, and among the remaining algorithms, the GLI algorithm achieved the best results. Therefore, the GLI method performed the best among the ten networks on the Blogs, Ca-Astroph, Friendships, EmailEU32430, Polbooks, Jazz, Protein, USAir2010, Reactomeand and Brightkite networks. The data curves in the stacked map showed a smooth downward trend, which was consistent with the SIR model results. For the Football network, due to the small difference in the degree value between the nodes, the curves of all algorithms showed certain fluctuations. The fluctuations of the KBKNR and EC algorithms were small, and their effect was relatively good. In the Karate network, except for the obvious curve fluctuations of the K-shell, CC, BC, and PR algorithms, the other algorithms showed a smooth downward trend, with a slight difference.

Consequently, the proposed GLI algorithm performed the best among all the algorithms on most networks, having similar results as the SIR model. Thus, the proposed algorithm could accurately identify key nodes in the networks.

Experimental results comparison with the SI model

To analyze the performance of the proposed algorithm further, the SI model was used to evaluate the key nodes identified by different algorithms. Due to limited space, only the Kendall values obtained by the algorithms are presented in this section.

The value of the infectious probability α plays an important role in the experiment. The infected rate is (1/2)θ(here we set θ = 3)55,57.

The Kendall values obtained by different algorithms for different networks are presented in Fig. 5.

Figure 5
figure 5

Kendall τ values of different algorithms compared with the SI model result on 12 networks. The value of the infectious probability a is 0.125.

As shown in Fig. 5, the Kendall τ value of the GLI algorithm was higher among all the algorithms for the Blogs, Friendships, USAir2010, Protein, Brightkite and EmailEU32430 networks. In the Jazz, Football and Karate networks, the Kendall τ values of the GIN algorithm were superior to the other algorithms. In the Polbooks and Reactome networks, the Kendall τ values of the KBKNR algorithm were highest. Only in the Ca-Astroph network, the Kendall τ values of the CC algorithm were highest. Consequently, the proposed GLI algorithm performed the best among all the algorithms on most networks.

Infection capability of the top 15 nodes

In order to validate the effectiveness of the GLI algorithm, we have calculated the infection ability of the top 15 nodes of the GLI and other algorithms in the SIR model. In the experiment, the infection probability α has been set to 0.01, and the recovery probability β has been set to one, the time step has been set from 1 to 30, and the number of iterations has been set as 1000.

As shown in Fig. 6, the number of infected nodes F(t) increased with the increasing time step t, and finally it reached a stable value at time step t = 10. This indicated that the top 15 influential nodes effectively infected other nodes in a short time. In the eight networks, namely the Blogs, Friendships, EmailEU32430, Polbooks, Karate, Protein, USAir2010 and Brightkite networks, the top 15 nodes of the GLI algorithm had the strongest infection ability. In the Ca-Astroph network, the infection ability of the top 15 nodes of GLI algorithm and DC algorithm were similar and better than other algorithms. In the Jazz and Reactome networks, the top 15 nodes of the RGLI algorithm and the PR algorithm had similar infection abilities, with distinct advantages over the GLI algorithm. But the GLI algorithm performed better than the other algorithms. In the football network, the effect of the GLI algorithm was general. Therefore, the top 15 nodes which selected by the GLI algorithm were stronger than other algorithms in the majority of networks. The result further demonstrated the effectiveness and accuracy of the proposed algorithm.

Figure 6
figure 6

The infection capability of top 15 nodes in the ranking of twelve different algorithms. F(t) represents the number of nodes that were infected by these top 15 nodes at time t.

Time complexity analysis

The time complexity analysis was performed considering the procedure performed by the proposed GLI algorithm, including three stages. The temporal complexity analysis results are described below.

First, the network was stored in the logical form of an N \(\times \) N matrix, and the computation degree needed to go through the other (n − 1) nodes, and the time complexity was O(n2). Further, the time complexity of calculating the K-shell value was O(n*logn). Next, the given value of the adjacent node was calculated, and this process involved the adjacent node; the time complexity of this process was O(n*(n − 1)) when the network was a complete network.

The total time complexity was calculated as the maximum of the above three time complexity values, and the time complexity of the proposed GLI algorithm was O(n2).

The source code is available online at: https://github.com/hhf602/GLI-Code.

Conclusion

This paper proposes an efficient and accurate algorithm for key node identification in a complex network named the GLI algorithm. The GLI algorithm first calculates the Ks value of a network node, which is expressed as a global influence. Then, the local influence is obtained considering the node degree, the adjacent node’s node degree value, and the adjacent node’s Ks value and introducing a similarity coefficient between the adjacent nodes. Finally, the node influence is calculated based on the global and local influence results. The proposed algorithm is verified by experiments. It is compared with the other related algorithms using the results of the SIR and SI models as the evaluation index. Based on the experimental results of the nine algorithms on ten networks, the proposed GLI method performs better than the other algorithms.

The main reason why the GLI algorithm achieves better results than the other algorithms is that in the proposed algorithm, the network nodes are ranked by the K-shell algorithm, and then the ranking result of the K-shell algorithm is refined according to the local influence of the nodes. This enhances the differentiation of node influence and solves the problem of hierarchical coarse-graining of the K-shell algorithm.

However, the proposed algorithm considers only information of adjacent nodes but not the information of secondary or tertiary neighbors of a node when calculating the local information influence contribution. Therefore, multiple influencing factors could be considered in further in-depth research.