Introduction

Complex networks refer to intricate systems composed of interconnected elements, such as nodes and edges, where the interactions between these elements exhibit non-trivial patterns1. To comprehensively analyze complex networks, researchers often explore them from four essential perspectives: path analysis, connectivity, community, and centrality. These analytical approaches shed light on the intricate pathways, structural connections, cohesive groups, and influential nodes within the network, enabling a holistic understanding of its dynamics and characteristics2. One of the most essential and challenging research challenges in network science is determining and prioritizing the most important nodes. The process of discovering and ranking the most influential nodes (INs) is critical for gaining a thorough view of a network's structure and operation 3. Several centrality metrics have been presented throughout the years to capture a network's rank based on node degree and importance in the network's structure4,5,6,7. It is considered that the efficiency of a centrality measure in finding key nodes is dependent on its topological significance8. Currently, a website (http://www.centiserver.org)9 has documented that there were approximately 403 centrality indices, providing a comprehensive resource for network analysis. However, despite this vast compilation, the exploration of identifying the most (INs) within complex networks remains an ongoing pursuit.

In latest years, methodologies for locating prominent nodes have gotten more targeted, relying only on global or local data. For example, K-shell decomposition (Ks)10,11 and the Degree Centrality (DC)12 approach is two of the most thoroughly explored interpretations of global and local information, respectively. Because of their simplicity, these two techniques have achieved broad use in networks of all sizes. However, Ks and DC have limits in determining the relative significance of nodes in a network.

In Ks, first and foremost, previous knowledge about the value of k is required, which may not be easily accessible13. Second, since the Ks is based on local connection14, it may not be useful in recognising the hierarchical structure of networks. Finally, it may not correctly represent the underlying structural aspects of the network since it may not capture the relevance of nodes that operate as bridges between various levels. It may be susceptible to the particular technique used to calculate the Ks, making it less accurate for comparing networks15,16,17.

On the other hand, DC is a standard network analysis metric that rates a node's relevance based on the number of edges (links) it has in the network. One issue is that it does not consider how excellent or crucial the relationships are18,19,20. Nodes with a high degree of centrality may have numerous connections, but those connections may be with nodes that are not central or significant in their own right21. Moreover, degree centrality does not take into account a node's structural location in the network22, which might influence its relevance. Lastly, degree centrality is less effective in directed networks with incoming and outward edges that need differentiation between in-degree and out-degree centrality measurements20,22,23. Overall, degree centrality is a valuable metric for detecting INs; however, it should be used in conjunction with other measures that capture other characteristics of a node's significance in a network14,21,22.

Other common centrality methods, such as betweenness centrality (BC)24 and closeness centrality (CC)25, which estimate node impact based on global network information and may give higher ranking results, have a significant processing cost26,27. Geographical information, both local and global, may have a substantial influence on the power of INs in a network.

17,18,28,29,30,31Researchers have focused on local and global network information to solve the problem of identifying INs17,18,28,29,30. Unfortunately, past traditional identification approaches frequently missed critical information, failing to account for global and local network information simultaneously7,31. As a result, the outcomes are often skewed. Information about the neighbour nodes do increase the accuracy and correctness of a method32. Researchers discovered that combining both in a network improved the identification of influential nodes. This integration enhanced detection at both the local (in community or cluster) and global levels7,33,34,35. Taking the coreness and shortest distance between nodes into account might improve the discovery of INs.

Recently, an innovative approach is being introduces which is the Global Structure Model (GSM)29 and its improved version, IGSM18, to identify INs in these networks. These approaches apply local and global information, which are Ks and DC, respectively. Yet, one key weakness of both methodologies is their inability to quantify the significance of individual nodes, leaving a large vacuum in our knowledge of complex networks. As the need for more accurate and extensive network research grows, new approaches that may overcome this constraint and give deeper insights into the structure of complex networks must be developed.

The primary contributions of this paper lie in the development and application of the Hybrid Global Structure Model (H-GSM). The H-GSM algorithm addresses the deficiencies of current techniques by considering both local and global information of each node, resulting in a more comprehensive understanding of the overall structure of complex networks. Specifically, our contributions are as follows:

  1. A)

    Introducing a technique for identifying influential nodesThe H-GSM algorithm combines local information (DC) and global influence (Ks), overcoming the limitations of single-measure centrality approaches. By integrating local and global influences, our method significantly enhances the accuracy of identifying INs, surpassing traditional centrality measures such as DC, BC, PR, CC, GSM, and IGSM.

  2. B)

    High performance of the algorithmThrough extensive experiments conducted on six real complex networks, our technique demonstrates superior performance in identifying INs. The results surpass those obtained using the widely used SIR model and Kendall's correlation coefficient, showcasing the effectiveness and reliability of the H-GSM algorithm.

  3. C)

    Superior scalability compared to other algorithmsThe H-GSM algorithm exhibits excellent scalability, making it suitable for larger networks. Its low computational complexity and straightforward implementation further enhance its applicability, providing a practical and efficient solution for analyzing complex network structures.

Overall, the H-GSM algorithm contributes to the advancement of network analysis by offering a novel approach that combines local and global influences, outperforming existing centrality measures, and providing superior scalability for large-scale network analysis.

The rest of this paper is organised as follows: In the section titled "Method", a brief introduction of numerous baseline approaches and the suggested H-GSM method are explained in detail. Next, a total of six actual networks data from the real-world case study have been adopted and used to validate the proposed method, which are described in the sections titled "Datasets and evaluation criteria" and "Results and discussions" respectively. This study's findings and recommendations for the future work are presented in the final section, titled "Conclusion and Future Recommendations".

Methods

Background analysis

Suppose a network is denoted as \(G = \left( {V,E} \right)\) where V is the set of nodes and E represents the edges. If there is an edge between node i and node j, then \(a_{ij} = 1\) they are directly connected, while if there is no edge, then \(a_{ij} = 0\) they are not directly connected. The total number of nodes in the network is denoted as n. The indices that use in this study are introduced in this section.

Degree centrality (DC)

The number of nodes close to or directly linked to a node is denoted by DC, which is the most basic form of centrality. DC reflects on node information at the most local level, which is straightforward and intuitive. The higher the degree, the bigger the effect of the node. A node's degree centrality formula is as follows:

$${\text{DC}}(i) = \sum\limits_{j = 1}^{n} {a_{ij} }$$
(1)

Betweenness centrality (BC)

The BC of a node is the ratio of the shortest pathways via the node to the total number of quickest routes. BC computes INs based on global data. A node with a high BC value serves an important function in linking various areas of the network. BC stands for

$${\text{BC}}(i) = \sum\limits_{j,k \ne 1}^{{}} {\frac{{g_{jk} (i)}}{{g_{jk} }}}$$
(2)

where \(g_{jk}\) indicates the number of paths and \(g_{jk} (i)\) represents the shortest paths between nodes \(j\) and \(k\) through a node \(i\).

Closeness centrality (CC)

CC also computes prominent nodes based on global data. CC indicates a node's proximity to all other nodes in the network. It uses the shortest distance (\(d_{ij}\)) between each pair of nodes to identify the influence of each node. CC of a node is defined as

$${\text{CC}}(i) = \frac{N - 1}{{\sum\limits_{j \ne 1} {d_{ij} } }}$$
(3)

K-shell decomposition (Ks) method

Ks is one of the global centrality approaches for determining the core location of a network. Ks gives an index to each network node by deleting nodes repeatedly depending on their degree. Nodes with one connection are removed, and the network's degree value is recalculated. Stripping additional degree nodes continues until no more nodes can be stripped. A node with a higher Ks-value is more significant in the network and should be given more attention or consideration when interpreting the model or making choices based on its predictions. The Ks metric indicates that a cluster of nodes will exhibit comparable significance within a network11, yet it falls short in equitably distinguishing the nodes that possess greater influences.

Global structure model (GSM)

GSM considers the node's self-influence and global influence. The influence of node i can be expressed as

$${\text{GSM}}(i) = {\text{SI}}(i){\kern 1pt} \; \times \;{\text{GI}}(i)\,{\kern 1pt} = e^{{{\raise0.7ex\hbox{${Ks(i)}$} \!\mathord{\left/ {\vphantom {{Ks(i)} n}}\right.\kern-0pt} \!\lower0.7ex\hbox{$n$}}}} \,\, \times \,\,\sum\limits_{i \ne j} {\frac{Ks(j)}{{d_{ij} }}}$$
(4)

where \(Ks\left( i \right)\) and \(Ks\left( j \right)\) denote the k-shell of node i and node j, respectively.

Improved global structure model (IGSM)

IGSM is an improved GSM model considering DC instead of the Ks method.

$${\text{IGSM}}(i) = {\text{improved\_SI}}(i){\kern 1pt} \; \times \;{\text{improved\_GI}}(i)\,{\kern 1pt} = e^{{{\raise0.7ex\hbox{${DC(i)}$} \!\mathord{\left/ {\vphantom {{DC(i)} n}}\right.\kern-0pt} \!\lower0.7ex\hbox{$n$}}}} \,\, \times \,\,\sum\limits_{i \ne j} {\frac{DC(j)}{{d_{ij}^{{{\text{ceil}}\left( {{\text{log}}_{2} (ave\_DC)} \right)}} }}}$$
(5)

Proposed method

The approach suggested in this study outperforms the GSM and IGSM methods already in use to identify INs in a network. The algorithm employs two indices: DC and Ks, and the suggested technique takes into account node position information. This is due to the fact that node placement is important in data distribution, and nodes in crucial places may have a stronger effect on the flow of information or resources within the network. The suggested technique offers a more complete approach to identifying INs in a network by combining these measurements and including location information.

To enhance the notion of node influence, we used the GSM's concept of self- and global impact, but applied it in a creative way. By increasing a node's DC by its Ks value, we established enhanced self-influence (iSI). This iSI factor was then used to calculate enhanced global impact (iGI), which takes into account all nodes that are directly or indirectly related to a node. The iGI factor is the sum of the neighbour ratios of the shortest route lengths for directed and undirected nodes with Ks and DC values. Its shortest route length is calculated by the average iSI value and is also referred to as information loss. In assessing node impact, the proposed H-GSM method takes into account both iSI and iGI parameters.

This suggested approach is significant because it can more accurately quantify how nodes in a network impact one another. Our strategy considers a node's local and global impacts, as well as how it affects other nodes in the network and the network as a whole. Consequently, including node position information aids in better understanding of data dispersion throughout the network. The new technique is likely to outperform current methods in terms of node determination, making it an important contribution to the area of network analysis. The complete equation is as follows:

$$iSI\left( i \right) = e^{{\frac{Ks(i)\; \times \;DC(i)}{N}}}$$
(6)
$$iGI\left( i \right) = \sum\limits_{i \ne j} {\frac{iSI(j)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_iSI} \right)} \right)}} }}}$$
(7)
$$\begin{gathered} {\text{H - GSM}}(v_{i} ) = iSI\left( i \right)\; \times \;iGI\left( i \right) \\ = e^{{\frac{Ks(i)\; \times \;DC(i)}{N}}} \; \times \;\sum\limits_{i \ne j} {\frac{{e^{{\frac{Ks(j)\; \times \;DC(j)}{N}}} }}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_iSI} \right)} \right)}} }}} \\ \end{gathered}$$
(8)

Computation process

Figure 1 depicts a basic network with 7 nodes and 10 edges segregated by its k-shell territory to further clarify the specific calculation procedure of the H-GSM algorithm. As indicated in the network, we consider the H-GSM approach by using node 3 as an example of the targeted node, hence i = 3. Node 3 is positioned on the third layer, designated by Ks = 3, as are nodes 0, 1, and 2. In terms of DC value, node 3 obviously has five edges attached to it, resulting in DC-value = 5. We begin by computing the Ks, DC, and shortest distance between each node.

Figure 1
figure 1

(a) A network with 7 nodes and 10 edges. (b) Removing of the most outer layer, which is node 4 and 6. (c) Removing of the next outer layer, which is node 5. (d) Summary of the Ks-territory of all the nodes in the network.

Step 1: Determine Ks and DC value.

$$Ks(3) = 3\quad \quad DC(3) = 5$$

Step 2: Calculate improved self-influence (iSI).

$$\begin{gathered} iSI(3) = e^{{\left[ {\frac{Ks(3) \times DC(3)}{n}} \right]}} \\ = e^{{\left[ {\frac{3 \times 5}{7}} \right]}} \\ = 8.523756 \\ \end{gathered}$$

Step 3: Calculate improved global influence (iGI).

$$\begin{gathered} iGI(3) = \sum\limits_{i \ne j} {\frac{kSI(j)}{{d_{ij}^{{cell\left( {\log_{2} \left( {kSI} \right)} \right)}} }}} \\ = \frac{kSI(0)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} + \frac{kSI(1)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} + \frac{kSI(2)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} + \frac{kSI(4)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} + \frac{kSI(5)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} + \frac{kSI(6)}{{d_{ij}^{{cell\left( {\log_{2} \left( {ave\_kSI} \right)} \right)}} }} \\ = \frac{3.617251}{{1_{{}}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} + \frac{5.552708}{{1_{{}}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} + \frac{5.552708}{{1_{{}}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} + \frac{1.153565}{{2_{{}}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} + \frac{1.770795}{{1_{{}}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} + \frac{1.153565}{{1_{ij}^{{cell\left( {\log_{2} \left( {3.903478} \right)} \right)}} }} \\ = 17.93542 \\ \end{gathered}$$

Step 4: Calculate node influence of H-GSM.

$$\begin{gathered} H - GSM(3) = iSI(3) \times iGI(3) \\ = 8.523756 \times 17.93542 \\ = 152.877133 \\ \end{gathered}$$

As illustrated in Fig. 1, Table 1 presents node rankings based on the implementation of the DC, BC, CC, Ks, GSM, IGSM, and H-GSM methodologies. Earlier works such as DC, KS, BC, and CC could only distinguish between six levels. Nonetheless, DC is better at level discrimination than KS and BC. The node rating for CC and GSM is the same. In finding the most INs in the network, both H-GSM and IGSM outperform standard GSM. For example, in GSM, rank 2 cannot tell which node is more significant between nodes 1 and 2. Yet, when it comes to distinguishing nodes, H-GSM surpasses IGSM. For example, given the greater value, the distinction between nodes 2 and 1 in H-GSM is obvious.

Table 1 Comparison results of simple network node influence evaluation indexes.

Datasets and evaluation criteria

Datasets

In this article, we experiment with several unweighted and undirected graphs of varying scales. We examine algorithm performance in terms of running time and influence spread and compare it to that of other algorithms. We apply the Susceptible-Infected-Recovered (SIR) epidemic model as a benchmark simulator over six real networks, including USAir9736, Netscience and its largest component subgraph (Netscience1)37, Email38, Yeast39, and Router40, in order to compare the performance of the proposed H-GSM with the other indexing methods. Table 2 lists some elementary statistics regarding these networks, including their total number of nodes (n), the total number of edges (m), maximum and minimum degree (dmax and dmin ), and maximum core value (coremax).

Table 2 The statistics of six real-world complex networks.

SIR model

SIR is a strategy that divides the population into three categories: susceptible (S), infected (I), and recovered (R). Just one node is chosen to be infected in each implementation, while the other nodes are set as vulnerable at each separate run. The seed node infects its neighbors with varying spreading probability \(\alpha\),and will recover from the infection with the probabilities \(\beta\). Each loop is viewed as a time step t, and F(t) gives the number of nodes infected at time t, which is used to evaluate the first infected node's effect41. When none of the nodes remain diseased, the spreading process comes to an end. The same processes are performed for each node in each network, with 500 iterations.

Kendall coefficient

Kendall's coefficient is used to determine how well-simulated rankings by the SIR model match the true rankings reached by centrality measures42. Kendall’s coefficient compares the similarity and consistency of two sequences. If the list of ranking strategy corresponds more strongly with the list of rated node-spreading abilities in the SIR model, the ranking method is more successful. The more INs has a larger capacity to propagate. Assume a network comprises n vertices, with \(n_{c}\) and \(n_{d}\) representing the number of concordant and discordant pairs, respectively. The formula for calculating Kendall's coefficient is as follows:

$$\tau = \frac{{n_{c} - n_{d} }}{{0.5n\left( {n - 1} \right)}}$$
(9)

The greater the \(\tau\) number, the more precise the ranked list generated by the ranking system. In the optimal scenario, \(\tau = 1\), the approach and the actual spreading process have identical ranking lists. With a large \(\alpha\) value, the spreading would encompass nearly the whole network. In this experiment, the SIR model's spreading probability gradually increases from 0.01 to 0.1.

Results and discussions

Computational complexity

We determine how difficult it is to utilise our strategy in order to demonstrate how well it works before discussing how well it works. Computational complexity, often known as algorithm complexity, is the amount of time or space required by an algorithm for a given input size. As previously stated, the method reveals that H-GSM is composed of three major components. Before executing the iSI formula with an \(O(time)\) complexity, the first stage determines the DC and Ks values. The second step began with the implementation of Dijkstra's shortest path length, signified by complexity, \(O(n^{2} )\) to determine the value of global influence (iGI) based on the values of DC and Ks. After that, the third step is performed to identify INs, which is the multiplication of iSI and iGI. Because our method is an improvement of GSM, we assumed that the overall computing time of H-GSM is also \(O(n^{2} )\).

We demonstrate the computational difficulty of our technique by benchmarking its execution against six networks. In terms of execution time, our approach surpasses DC, BC, CC, GSM, and IGSM, as shown in Table 3. Whilst DC is commonly touted to have the simplest form and be the easiest to calculate, in terms of time execution, H-GSM exceeds DC. The techniques in this work are implemented on a Windows 11 platform 64-bit system; the machine hardware configuration is an Intel® Core i7-8550U CPU @ 2.4 Hz processor, 24 GB of RAM; and Python-Visual Studio Code 1.56.2 is used for programming.

Table 3 Time execution for computational complexity for six networks.

Nodes spreading and discrimination comparison

We use the SIR model, which is extensively used in network epidemic dynamics, to quantify node impact contributions by examining their spread and differences across nodes. SIR decides which nodes may spread out faster and wider over time. In general, the importance of a node is proportionate to its capacity to expand. The node with the greatest spreading capacity has the most power.

Table 4 shows the top 10 nodes in six networks with great spreading capability using the DC, BC, CC, GSM, IGSM, and H-GSM methodologies. As illustrated in Fig. 2, the ten nodes for each strategy were then used as seed nodes to monitor the nodes' convergence in propagation capacities. The number of infected nodes, F(t), increases with time and quickly stabilises. H-GSM has highly effective propagation throughout the majority of the network.

Table 4 Top-10 ranking nodes for: (a) USAir97, (b) Netscience1, (c) Email, (d) Netscience, (e) Yeast and (f) Router network.
Figure 2
figure 2

Propagation influence of top 10 nodes of six networks by six methods.

After that, we analyse if the existence of a node in H-GSM effects its propagation. To analyse H-convergences, GSM's a node from H-GSM that was not existent in GSM is compared with a node from GSM, as shown in Table 4. The following are the findings:

USAir97

All metrics had the same initial affected nodes in terms of node rank. When we looked at GSM, IGSM, and H-GSM, we discovered that their top 10 nodes were identical except for the order. When comparing H-GSM with GSM, the top five nodes are the same, but the sixth varies. As shown in Figure 3a, node 90 (H-GSM) and node 58 (GSM) behave similarly in the beginning, however node 90 had a surge from t = 2 to t = 5. In a while, node 90 may infect others more than node 58 over time.

Figure 3
figure 3

Node's discriminatory comparison of H-GSM and GSM for (a) USAir97, (b) Netscience1, (c) Email, (d) Netscience, (e) Yeast, and (f) Router networks.

Netscience1

The lists produced by the various approaches vary in terms of the rank of each node. The first three lists spanning DC, IGSM, and H-GSM are largely identical, with the main change being the order in which the items occur. When node 58 (H-GSM) is compared to node 14 (GSM), as shown in Figure 3b, we can observe that node 58 performs better since it spreads quicker and further over time.

Email

It seems that the top five GSM, IGSM, and H-GSM nodes all have similar ranks in this network. The same five nodes appear in all three DC, BC, and CC variations. As illustrated in Figure 3, node 15 (H-GSM) has a quicker node effect than node 298 (GSM), despite their long-term behaviour being very comparable (GSM) (c).

Netscience

Node 106 appears in the top five node rank lists for every technology except GSM in this network. Hence, if we look at node 106 in H-GSM, we can see that it eventually outperforms node 54 in GSM in terms of how quickly and far it expands over time. Figure 3 displays the results (d).

Yeast

Similarly, node 35 occurs in the top 10 nodes rated in all methods except GSM. As shown in Figure 3, node 35 (H-GSM) has a greater overall effect than node 596 (GSM) in terms of the pace at which it expands and the total node influence it holds (e).

Router

The patterns seem to be the same for node 139 (H-GSM) and node 369 (GSM), with the exception that node 139 appears to be more important in that it is spreading faster and broader than node 369. After comparing the rankings for various techniques to the list of node ranks in GSM in Figure 3f, it was discovered that only a handful were enrolled in the top-10 most INs.

Kendall's model comparison

The SIR and Kendall models are used to ensure that the relationships discovered between H-GSM and other well-known metrics of centrality are relevant and helpful. Kendall's for the proposed H-GSM and other approaches is shown in Fig. 4. In Kendall's words, H-GSM surpasses state-of-the-art baseline approaches implemented on various network topologies by an alpha of 0.01 to 0.10, proving that it effectively conveys information. The H-GSM surpassed competitors for USAir97 and Netscience 1. H-GSM rates strongly in the Email, Netscience, Yeast, and Router categories while being somewhat inferior to the top two approaches. In comparison to H-GSM and IGSM, the original GSM technique has a low value.

Figure 4
figure 4

Kendall coefficient results comparisons for the six methods for (a) USAir97, (b) Netscience1, (c) Email, (d) Netscience, (e) Yeast, and (f) Router networks.

Conclusion and future recommendations

When H-GSM findings are compared to GSM and other techniques, it is evident that H-GSM is superior. H-GSM is a hybrid strategy that improves on the GSM methodology by taking into account both the network's local and global structure, as represented by the DC and KS approaches, respectively. The capacity of a node to exchange information with all other nodes, as well as the node's own impact, are utilised as indicators to measure the node's influence in the network. The fundamental assumption is that the place and degree of connectedness of nodes are the major drivers of their influence. The more common nodes there are between two nodes, the closer they are, suggesting a better capacity to transfer information. The more significant a neighbour node is, the more it contributes to a node's influence.

To analyse the usefulness of the proposed strategy, we use the SIR model to simulate the propagation process and perform tests on the two major aspects of discrimination and accuracy in diverse real-world networks. First, when compared to other techniques, H-GSM has the lowest average computational complexity in terms of execution time. Second, the proposed technique illustrates that integrating local and global information may successfully decouple a node's value by comparing the top ten most INs. Moreover, the suggested technique in the research outperforms others in ranking correlation, proving its high accuracy. Finally, by integrating the strengths of the DC and Ks indices and incorporating additional self- and global impact metrics, the proposed H-GSM algorithm improves on previous techniques. The addition of location and node data improves its capacity to recognise important nodes in a network. We think that our suggested technique will make a substantial contribution to the area of network analysis and will be beneficial in a variety of applications.

In conclusion, this paper has presented the H-GSM as a novel approach for analyzing complex networks. By considering both local and global information of each node, the H-GSM algorithm addresses the deficiencies of current techniques and offers a more comprehensive understanding of network structures. H-GSM algorithm presents a significant step forward in network analysis, enabling researchers to gain deeper insights into complex network structures and identify INs with improved accuracy and scalability. This is one of our early efforts by using the network's topological connection structure. Additionally, we will continue our study by evaluating other combinations and validation approaches in order to increase the performance of the methodologies offered.