Identifying vital nodes for influence maximization in attributed networks

Wang, Ying; Zheng, Yunan; Liu, Yiguang

doi:10.1038/s41598-022-27145-3

Download PDF

Article
Open access
Published: 31 December 2022

Identifying vital nodes for influence maximization in attributed networks

Ying Wang¹,
Yunan Zheng¹ &
Yiguang Liu¹

Scientific Reports volume 12, Article number: 22630 (2022) Cite this article

1470 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Identifying a set of vital nodes to achieve influence maximization is a topic of general interest in network science. Many algorithms have been proposed to solve the influence maximization problem in complex networks. Most of them just use topology information of networks to measure the node influence. However, the node attribute is also an important factor for measuring node influence in attributed networks. To tackle this problem, we first propose an extension model of linear threshold (LT) propagation model to simulate the information propagation in attributed networks. Then, we propose a novel community-based method to identify a set of vital nodes for influence maximization in attributed networks. The proposed method considers both topology influence and attribute influence of nodes, which is more suitable for identifying vital nodes in attributed networks. A series of experiments are carried out on five real world networks and a large scale synthetic network. Compared with CELF, IMM, CoFIM, HGD, NCVoteRank and K-Shell methods, experimental results based on different propagation models show that the proposed method improves the influence spread by $-2.28\% \, \textrm{to} \, 4.76\%$, $-2.50\% \, \textrm{to} \, 16.97\%$, $0.18\% \, \textrm{to} \, 16.07\%$, $0.22\% \, \textrm{to} \, 41.82\%$, $0.23\% \, \textrm{to} \, 11.24\%$ and $10.78\% \, \textrm{to} \, 75.22\%$.

Identifying vital nodes in complex networks by adjacency information entropy

Article Open access 14 February 2020

Characterizing the interactions between classical and community-aware centrality measures in complex networks

Article Open access 12 May 2021

Integrating local and global information to identify influential nodes in complex networks

Article Open access 14 July 2023

Introduction

Complex networks are common in real world and can be used to represent complex systems in many fields. More and more complex networks come with attributes in nodes and are named as attributed networks¹. These networks not only contain topology structures, but also have rich node attribute information such as text descriptions of nodes and comments related to nodes. Influence maximization (IM) is a classic optimization problem in network science, which aims to seek a set of vital nodes that the diffusion orients from these nodes can cause the maximum influence spread in networks. Vital nodes identification for IM has been widely used in many applications such as viral marketing², information propagation³, rumor analysis⁴ and so on.

Many IM algorithms have been proposed in complex networks, including diffusion-based algorithms^5,6,7 and heuristic-based algorithms^8,9,10,11,12. Diffusion-based algorithms provide a good performance guarantee to the optimal solution with the weakness of enormous calculations. Heuristic-based methods improve efficiency to some extent but take no consideration of propagation models or do not optimize a global function of influence. Recently, community-based methods^13,14,15 play an important role in the IM problem. A community is defined as a group of nodes with dense internal connections and relatively sparse connections to the rest of the network. It can effectively represents the organization and structure of the network¹⁶. Benefiting from the fact that different communities are sparsely connected, the propagation overlap between seed nodes selected from different communities can be effectively reduced.

Due to the benefits of community-based influence maximization algorithms, many previous studies have focused on them in complex networks. The first and foremost step of community-based algorithms is community detection. Numerous community detection methods based on matrix factorization^17,18, label propagation^19,20, percolation²¹ and random walks^22,23 have been proposed with certain limitations and scalability issues. However, these community detection methods only use the information relevant to the graph topology and fail to correlate node features with the community structure²⁴. Recently, the graph-embedding based community detection methods^25,26 have attracted tremendous attention, since they can learn a representation that embeds the topology into the attribute for each node. Given the good performance of graph-embedding methods in community detection, we try to apply it to solve the influence maximization problem.

Although many community-based methods have been proposed for the IM problem, there are few methods that are suitable for attribute networks. Almost all graph clustering or community detection methods in attribute networks do not conduct the influence maximization study since there are no suitable information propagation models for attributed networks. Moreover, community-based influence maximization algorithms avoid the propagation overlap between seed nodes selected from different communities, but the propagation overlap between seed nodes selected from the same community may still exists which may reduce the influence spread. To solve the above problems, we propose an information propagation model and a novel community-based influence maximization algorithm for attributed networks. The main contributions are summarized as following:

An extension of classic linear threshold (LT) information propagation model is proposed named LTPlus, which not only considers topology structures of networks but also attributes of nodes.
To solve the influence maximization problem in attributed networks, we propose a community-based influence maximization algorithm using graph-embedding. To the best of our knowledge, it is the first time that a graph-embedding based community detection method is used to the influence maximization problem.
The proposed method alleviates the propagation overlap between seed nodes selected from the same community by recalculating the influence of seed nodes’ predecessors during the seed nodes selection process.
Extensive analysis is performed on six datasets, and experimental results show that the proposed method has a good performance.

Related work

The related IM algorithms in this paper are classified into three categories: diffusion-based methods, heuristic-based methods and community-based methods. These methods are discussed with more details below:

Kempe et al.⁵ proposed the diffusion-based method, Greedy, which provides a $(1-1/e-\varepsilon )$ approximation performance guarantee to the optimal solution. However, its computation cost is expensive since it needs to perform Monte-Carlo simulations on all possible combinations of the current seed set and remaining nodes. Leskovec et al.⁶ proposed the CELF algorithm which employed the principle of diminishing marginal utility to avoid a lot of Monte-Carlo simulations. It significantly reduces the time complexity but it is still not scalable to large scale networks.

To improve efficiency, some heuristic centrality measures, such as degree centrality²⁷, K-Shell⁹, betweenness centrality²⁸ and closeness centrality²⁹ etc., were proposed to evaluate node influence. Moreover, Li et al.^3,30 proposed to identify influential nodes by novel gravity models. LENC¹² identified influential nodes by the entropy of the node based on the weight distribution of edges connected to it. However, these methods may lead to rich-club effect in solving the IM problem. VoteRank³¹ was proposed to reduce the rich-club effect by selecting seed nodes based on a voting scheme, where the voting ability of each node is the same and each node gets the vote from its neighbors. NCVoteRank³² argued that the voting ability of each node should be different and depends on its topological position. A fast and accurate IM algorithm, LMP³³, was proposed by using a local traveling for labeling of nodes based on the influence power. This method can achieve a linear time complexity, while have good performance. HGD³⁴ presented a heuristic group discovery method to reduce the influence overlap, which utilized the K-Shell and degree centrality to cluster nodes. However, HGD is a local optimal clustering algorithm that cannot guarantee global optimal performance. Overall, heuristic-based methods are relatively time efficiency but may lack performance guarantee in some networks.

As the community detection is an appropriate approach for understanding the structure and hidden information in complex networks³⁵, many community-based IM methods were proposed. Li et al.³⁶ pointed out that higher community diversity can reduce the risk of marketing campaigns and prolong the effect of a marketing campaign in the future promotion. OASNET³⁷ used the Clauset-Newman-Moore community detection method and selected candidate nodes from each community by classic greedy-based algorithm, then selected seed nodes from candidates by dynamic programming. However, the efficiency of this method still need to be improved. A fast overlapping community-based IM method, FIP³³, was proposed by removing insignificant communities to decrease the search space for choosing seed nodes. This makes the method time efficient. The probability coefficient of global diffusion is considered to improve seed node selection performance. CoFIM³⁸ used the Louvain algorithm³⁹ for community detection and defined the node-expansion and intra-community propagation under the weighted cascade model, which successfully avoid thousand times of Monte-Carlo simulations. This method performs well on many large-scale datasets and has high time efficiency.

However, these aforementioned methods just focus on network topologies and fail to measure the importance of node attributes in attributed networks, while the attribute is also an essential indicator as well as the topology. Some literature^40,41 dealt with node attributes and studied target-aware IM problem, but their optimization objective functions are different from traditional IM. Besides, the continued growth of the network scale and high-dimensional node attributes put forward higher requirements for the efficiency and scalability of community detection algorithms in attributed networks. Inspired by the significant progresses in graph-embedding⁴², graph-embedding based community detection came into view in recent years. AANE⁴³ computed the attribute similarity matrix between nodes and calculated vector representation associated with structural information and designed the joint learning process in a distributed manner. He et al.⁴⁴ cast MRFasGCN as an encoder for unsupervised community detection in attributed networks. AGC⁴⁵, an adaptive graph convolution method, exploited high-order graph convolution to capture global cluster structure and adaptively selected the appropriate order for different networks. These graph-embedding methods only complete the community detection task, but do not solve the IM problem. Therefore, vital nodes identification for IM in attributed networks is still a challenging problem to be solved.

Preliminaries

Attributed networks

Given a directed and attributed network $G=(V,E,X)$, where $V=\{v_1,v_2,\ldots ,v_N\}$ is the set of nodes and $|V|=N$. E is the set of edges which can be represented as an adjacency matrix $A=\{a_{ij}\}\in {\mathbb {R}}^{N\times N}$, where $a_{ij}=1$ if node $v_{i}$ connects to node $v_{j}$ and otherwise $a_{ij}=0$. $X=[x_1,x_2,\ldots ,x_N]^{T}$ is the attribute matrix of all nodes, where $x_i\in {\mathbb {R}}^d$ is a real-valued attribute vector of node $v_i$ and d is the dimension of attribute.

Linear threshold (LT) model

The LT model⁵ is a widely used information diffusion model. In the LT model, nodes are divided into two states: active and inactive. In a directed network, the activation of node $v_i$ depends on its in-neighbors $N_{in}(v_i)$. If $v_j\in N_{in}(v_i)$ is active, it has an influence on $v_i$, denoted as $b_{v_j,v_i}$. In the LT model, $b_{v_j,v_i}$ is set as:

$$\begin{aligned} b_{v_j,v_i} = \frac{1}{k_{in}(v_i)} , \end{aligned}$$

(1)

where $k_{in}(v_i)$ represents the in-degree of node $v_i$. Each node in $N_{in}(v_i)$ has an influence value to $v_i$, and the summation of these values must be no more than 1, that is $\sum _{v_j\in N_{in}(v_i)}b_{v_j,v_i}\le 1$. Each node $v_i$ has an activation threshold $\theta _{v_i}$ which is between 0 and 1. Therefore, $v_i$ will be activated once $\sum _{v_j\in N_{in}(v_i)}b_{v_j,v_i}\ge \theta _{v_i}$. The diffusion process is over until no more nodes can be activated.

Independent cascade (IC) model

Another well-known information diffusion model is the IC model⁴⁶. In the IC model, each edge has a probability p to measure the social influence of this edge. Nodes are also divided into active and inactive states. If a node $v_i$ is activated, then it has a chance with probability p to activate its inactive out-neighbor $v_j$ in a directed network.

Influence maximization

Influence maximization⁴⁷ aims to find a node subset $S\subseteq V$ and $|S|=m$, such that the expected influence scope is maximal:

$$\begin{aligned} S^* = \arg _S\max \phi (S), \end{aligned}$$

(2)

where $\phi (S)$ is an objective function used to evaluate the expected number of active nodes after the diffusion process.

Well-known state-of-the-art methods

Four state-of-the-art IM methods are introduced in this paper. These algorithms have been proved^48,49 to perform well on many datasets.

CELF⁶: a much faster greedy-based algorithm based on the submodularity of the spread function. By using the principle of diminishing marginal utility, CELF achieves an up to 700 times improvement in running time while maintains similar practical performance compared with the simple greedy-based algorithm. However, the running time of CELF is still terrible especially on large-scale datasets which makes it meaningless in practical applications. Thus, we do not compare it on the Synthetic dataset in this paper.
IMM⁵⁰: a martingale-based algorithm which utilizes reverse influence sets⁵¹. It computes a lower bound of the maximum expected spread of m nodes and derives the number of random Reverse Reachable(RR) sets needed to be sampled. The first m nodes that appear most frequently in the RR sets are selected as seeds.
CoFIM³⁸: a community-based framework for influence maximization assuming that influence propagates from seed nodes to their neighbors and then from these neighbors to other nodes within the same community. Based on this assumption, an incremental greedy algorithm is developed to select seed nodes. In contrast to other community-based algorithms, CoFIM has high time efficiency.
HGD³⁴: a heuristic group discovery algorithm using centrality metrics and the strong community rule to cluster cohesive nodes into one group. Compared with other heuristic-based algorithms, HGD is more efficient and perform well especially when m is small since it is a local optimal algorithm.
NCVoteRank³²: a neighborhood coreness based voting approach designed to find spreaders by taking the coreness value of neighbors into consideration for the voting of node influence. NCVoteRank is also a heuristic-based algorithm, which outperforms many existing popular algorithms and is competitive in time complexity.
K-Shell⁹: in this method, nodes that locate within the core of the network are identified to be more important by the K-Shell decomposition analysis. The top k nodes with larger K-Shell value are selected as seeds.

Methods

The proposed LTPlus propagation model

For a given directed and attributed network G, the LTPlus model considers both the topology influence and the attribute influence between nodes. In order to better compare with the LT model, we do not change the topology influence evaluation method in the classical LT model. Thus, the incoming topology influence of $v_i$ is the same as Eq. (1), and here it is noted as $TI_{in}(v_j,v_i)$:

$$\begin{aligned} TI_{in}(v_j,v_i)=\frac{1}{k_{in}(v_i)}, \end{aligned}$$

(3)

where $v_j$ is the in-neighbour of $v_i$.

Since node attributes represent common characteristics among nodes which play essential roles in the information diffusion, the incoming influence from in-neighbors in the LTPlus model is jointly decided by both the incoming topology influence and the incoming attribute influence. Similar attribute vectors mean that these nodes are homogenous, and the information propagation between these nodes will be easier. That is to say, the attribute influence will be greater if attribute vectors of two nodes are similar. We simply use the cosine similarity⁵² to measure the similarity of attribute vectors:

$$\begin{aligned} s_a(v_j,v_i) = \frac{x_i\cdot x_j}{\Vert x_i\Vert \cdot \Vert x_j\Vert }. \end{aligned}$$

(4)

In order to make the topology influence and attribute influence in the same order of magnitude, we adopt the edge-softmax⁵³ method to normalize $s_a(v_j,v_i)$ for each node and get the incoming attribute influence of $v_i$:

$$\begin{aligned} AI_{in}(v_j,v_i) = \frac{s_a(v_j,v_i)}{\sum _{v_l \in N_{in}(v_i)} s_a(v_l,v_i)}, \end{aligned}$$

(5)

where $v_j$ is the in-neighbour of $v_i$, and $N_{in}(v_i)$ represents the in-neighbors set of $v_i$.

To sum up, the incoming influence of node $v_i$ from its in-neighbour $v_j$ is calculated as the linear combination of the incoming topology influence $TI_{in}(v_j,v_i)$ and the incoming attribute influence $AI_{in}(v_j,v_i)$. Thus, the incoming influence ${\hat{b}}_{v_j,v_i}$ in LTPlus model is defined as:

$$\begin{aligned} {\hat{b}}_{v_j,v_i} = \alpha _1 \cdot TI_{in}(v_j,v_i) + \alpha _2 \cdot AI_{in}(v_j,v_i), \end{aligned}$$

(6)

where $\alpha _1$ and $\alpha _2$ indicate the weight coefficients of topology and attribute influence, $\alpha _1, \alpha _2 \in (0,1)$ and $\alpha _1 + \alpha _2 = 1$.

Obviously, the LTPlus propagation model takes into account topology structure and attribute similarity between nodes. Besides, the LTPlus propagation model fully considers that different in-neighbors contribute different attribute influence, which is more in line with real situations of information propagation. When $\alpha _1 = 1$, the LTPlus model degenerate into the LT model, while $\alpha _1 = 0$ means only node attributes are considered in information diffusion process. Generally, we treat the topology and attribute influence on an equal basis and set $\alpha _1 = \alpha _2 = 0.5$.

The graph-embedding based community detection method

The goal of graph-embedding based community detection is to partition nodes in the network G into l clusters $C=\{C_1,C_2,\ldots ,C_l\}$. As mentioned above, an adaptive graph convolution (AGC) method⁴⁵ is used in this paper as the community detection method. A low-pass graph filter F⁴⁵ is designed in AGC:

$$\begin{aligned} F = I - \frac{1}{2}L_s, \end{aligned}$$

(7)

where $L_s = I-D^{-\frac{1}{2}}AD^{-\frac{1}{2}}$ is the symmetrically normalized graph Laplacian operator, I is the identity matrix and D is the degree matrix. To capture global graph structures and facilitate clustering, AGC defined k-order graph convolution⁴⁵ as:

$$\begin{aligned} {\bar{X}}=(I-\frac{1}{2}L_s)^k X, \end{aligned}$$

(8)

where k is a positive integer. After convolution, AGC employed the linear kernel $K={\bar{X}}{\bar{X}}^T$ to learn pairwise similarity between nodes and then performed spectral clustering on $W=\frac{1}{2}(|K|+|K^T|)$ to obtain clustering results.

k-order graph convolution will produce smoother attributes as k increases, but too large k may lead to over-smoothing, i.e., the attributes of nodes in different clusters are mixed and become indistinguishable. To adaptively select the order k, the intra-cluster distance intra(C)⁴⁵ is computed to measure clustering performance:

$$\begin{aligned} intra(C)= \frac{1}{|C|}\sum _{C_i\in C} \frac{1}{|C_i|(|C_i|-1)}\sum _{v_i,v_j\in C_i,v_i\ne v_j}\Vert \bar{x_i}-\bar{x_j}\Vert , \end{aligned}$$

(9)

where |C| is the number of communities and $|C_i|$ is the number of nodes in community $C_i$. This graph convolution network is trained iteratively until intra(C) converges.

However, AGC is designed for undirected networks. The symmetric operator $L_s$ cannot be directly used for directed networks, since adjacency matrices of directed networks are asymmetric. A simple but effective method is to construct a symmetric matrix $A_s$⁵⁴:

$$\begin{aligned} A_s=A+A^T. \end{aligned}$$

(10)

Then, a degree matrix $D_s$ is built from $A_s$ and the Laplacian operator is $L_{sd} = I-D_s^{-\frac{1}{2}}A_sD_s^{-\frac{1}{2}}$. That is, the graph Laplacian operator $L_s$ in AGC is replaced by $L_{sd}$ in this paper. For the convenience of notation, the improved AGC method applicable for directed networks is noted as DAGC.

The seed nodes selection method

After community detection, nodes with powerful influence will be selected from different communities by measuring both topology and attribute influence. There are two key issues in the seed nodes selection phase: (1) The first problem is that how many nodes should be selected from each community. (2) The second problem is that how to select seed nodes.

To address the first problem, we empirically find that communities of different sizes should not be treated the same, since placing seed nodes in a large community could trigger more nodes than in a small community. According to this, a quota-based approach is adopted and $m_{C_i}$ nodes are selected from each community:

$$\begin{aligned} m_{C_i} = round(m \times \frac{|C_i|}{N}), \end{aligned}$$

(11)

where round() function means rounding the value to the nearest integer, and m is the total number of seed nodes. Thus, $m_{C_i}$ nodes will be selected from community $C_i$ and added to the seed node sequence. If the seed node sequence length is larger than or equal to m, the iteration will be broken. In contrast, if the seed node sequence is smaller than m, the node with the maximum influence in the current network will be selected as the seed node.

For the second key problem, when selecting influential nodes in directed networks, we pay more attention to how many nodes can be affected by one node. The more nodes it points to, the more nodes it can affect. Thus, the out-degree of each node is used to measure its topology influence, which can be formulated by:

$$\begin{aligned} TI_{out}(v_i)=k_{out}(v_i). \end{aligned}$$

(12)

The more similar the attributes between nodes, the more likely the information successfully propagates between these nodes. Thus, the attribute influence of a node is measured by its attribute similarities to its out-neighbors. Attributes after graph convolution ${\bar{X}}$ are used to compute cosine similarities for nodes since they integrates topology and attributes well. It is noteworthy that different from Eq. (4), the attribute similarity after convolution noted as $\overline{s_a}(v_i,v_k)$ is calculated between node $v_i$ and its out-neighbor $v_k$:

$$\begin{aligned} \overline{s_a}(v_i,v_k) = \frac{\bar{x_i}\cdot \bar{x_k}}{\Vert \bar{x_i}\Vert \cdot \Vert \bar{x_k}\Vert }. \end{aligned}$$

(13)

The attribute influence of a node is calculated by summing the attribute similarities to its all out-neighbors:

$$\begin{aligned} AI_{out}(v_i) = \sum _{v_k \in N_{out}(v_i)} \overline{s_a}(v_i,v_k), \end{aligned}$$

(14)

where $N_{out}(v_i)$ is the out-neighbors set of node $v_i$.

To ensure that the influence of each node is in the range of [0, 1], the topology and attribute influence of each node are normalized by Min-Max scaling normalization method. The normalization of $TI_{out}(v_i)$ and $AI_{out}(v_i)$ noted as $NTI(v_i)$ and $NAI(v_i)$ respectively are calculated as follows:

$$\begin{aligned} \left\{ \begin{aligned} NTI(v_i)= & {} \frac{TI_{out}(v_i)-min(TI_{out})}{max(TI_{out})-min(TI_{out})} \\ NAI(v_i)= & {} \frac{AI_{out}(v_i)-min(AI_{out})}{max(AI_{out})-min(AI_{out})}, \end{aligned} \right. \end{aligned}$$

(15)

where $max(TI_{out})$ and $min(TI_{out})$ are the maximal and minimal value of nodes’ topology influence respectively, and similarly $max(AI_{out})$ and $min(AI_{out})$ are the maximal and minimal value of nodes’ attribute influence respectively. The topology influence and the attribute influence are supposed to be treated on an equal basis. Thus, the total outcoming influence of each node is:

$$\begin{aligned} INF(v_i) = NTI(v_i) + NAI(v_i). \end{aligned}$$

(16)

For communities whose $m_{C_i}>0$, the INF value of nodes in this community will be calculated and the node with the maximum INF value will be selected as the seed node. To reduce the propagation overlap between seed nodes selected from the same community, the node will be removed from the network when it is selected as a seed node and the influence of its in-neighbors should be weakened. Suppose that node $v_j$ is a in-neighbour of node $v_i$, the topology and attribute influence of $v_j$ will be reduced if node $v_i$ is selected as the seed node. The updated topology influence $TI_{out}^{'}(v_j)$ and attribute influence $AI_{out}^{'}(v_j)$ can be calculated as:

$$\begin{aligned} \left\{ \begin{aligned} TI_{out}^{'}(v_j)&= TI_{out}(v_j)-1 \\ AI_{out}^{'}(v_j)&= AI_{out}(v_j)- \overline{s_a}(v_j,v_i). \end{aligned} \right. \end{aligned}$$

(17)

Then normalization topology and attribute influence of $v_j$ can be updated by taking Eq. (17) into Eq. (15), respectively. Finally, $INF(v_j)$ is also updated by recalculating Eq. (16). The node with the maximum INF will be selected as the seed node in each iteration. The proposed seed nodes selection method can be summarized as Algorithm 1.

Complexity analysis

We also analyze the time complexity of our proposed algorithm. Firstly, if DAGC method iterates t times, the time complexity of DAGC community detection is $O(N^2dt+ndt^2)$ where N is the number of nodes, d is the number of attributes and n is the number of nonzero entries of the adjacency matrix A⁴⁵. Secondly, influence values for nodes in communities whose $m_{C_i}>0$ will be calculated in the seed nodes selection phase (as described in the 3th to 9th rows of Algorithm 1), which have a $O(l\cdot m_{C_i}\cdot |C_i|)$ complexity. Since $|C_i|$ can be approximated as the average value $\frac{N}{l}$ and $m_{C_i}$ is a constant, $O(l\cdot m_{C_i}\cdot |C_i|)\approx O(N)$. The complexity for recalculating influence of the selected node’s in-neighbors (as described in the 12th row of Algorithm 1) is $O(l\cdot m_{C_i} *N_{in}(v_i^*))$. Since $N_{in}(v_i^*)\ll |C_i|$, $O(l\cdot m_{C_i} *N_{in}(v_i^*))\ll O(N)$, the complexity of the seed nodes selection method is O(N). Overall, the total complexity of our proposed influence maximization algorithm is $O(N^2dt+ndt^2+N)$.

Results

Data description

We evaluate the performance of the proposed algorithm on five real world datasets and a large-scale synthetic dataset. Details of these datasets are described in Table 1. Five real world datasets including Pubmed, Cora, Cornell, Texas and Washington. The Pubmed dataset consists 19,717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. Its citation network consists 44,338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words. The Cora dataset consists 2708 scientific publications and 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. The Cornell, Texas and Washington datasets are gathered from three different universities. Each line of these datasets contains two webpage IDs. The first entry is the ID of the webpage being cited and the second ID stands for the webpage which contains the citation. The synthetic large dataset named ‘Synthetic’ is constructed with 105,000 nodes and 830,159 edges. To generate our synthetic dataset, the function $random\_partition\_graph()$ in the networkx package of Python is used. More specifically, the number of community is set as 3 and the size of community is set as $[3\times 10^4, 3.5\times 10^4, 4\times 10^4]$. Nodes in the same community are connected with probability $2.5\times 10^{-4}$ and nodes of different communities are connected with probability $1\times 10^{-4}$. The attribute of each node is a vector of size 100. Initially, each bit of the vector is randomly assigned 0 or 1. When all neighbors of a node have attributes, the attribute of this node is rounding the average attribute value of its neighbors.

Table 1 Details of six datasets used in this paper.

Full size table

Performance metrics

Two critical metrics are employed to evaluate the performance of our proposed algorithm in this paper:

Influence spread $\sigma (S)$: for a given seed set S, the number of expected active nodes when the diffusion on the propagation model comes to steady state is denoted as $\phi (S)$. In the following experiments, $\phi (S)$ is the average value of 1000 times Monte-Carlo simulations. To facilitate observations on datasets of different scales, influence spread is defined as the ratio between $\phi (S)$ and the total number of nodes in the dataset:
$$\begin{aligned} \sigma (S)=\frac{\phi (S)}{N}. \end{aligned}$$
(18)
Influence spread is used to evaluate the effectiveness of an influence maximization algorithm. Higher $\sigma (S)$ value indicates that the algorithm is more effective.
Running time: running time is defined as the time for selecting m seed nodes. In the previous community-based influence maximization study³⁸, only the time of seed nodes selection phase is considered. To analyze the running time of the whole influence maximization algorithm in more detail, we report the running time of community detection, attribute similarity calculation (or K-Shell calculation for HGD and NCVoteRank) and seed nodes selection respectively, as shown in Table . The running time is measured in seconds.
Speedup: the speedup is measured for influence spread of the proposed method over baseline methods with $m=30$, 40 and 50 seed nodes. The speedup⁵⁵ is computed as:
$$\begin{aligned} \text {speedup}=((A-B)/A)\times 100, \end{aligned}$$
(19)
where A and B are the influence spread of two compared methods. For example, if the influence spread of Ours and K-Shell methods are 0.4475 and 0.2328, respectively, the speedup of Ours compared to K-Shell is calculated as: $\text {speedup}_{Ours\rightarrow K-Shell}=((0.4475-0.2328)\div 0.4475\times 100)=47.98$. Similarly, the speedup of K-Shell compared to ours is calculated as: $\text {speedup}_{K-Shell\rightarrow Ours}=((0.2328-0.4475)\div 0.2328\times 100)=-92.23$.

Experimental results

Based on the above networks, benchmark algorithms including CELF⁶, IMM⁵⁰, CoFIM³⁸, HGD³⁴, NCVoteRank³², K-Shell⁹ are used to compare with our proposed method. To evaluate the effectiveness of our proposed method, we compare the influence spread $\sigma (S)$ of different algorithms under different initial numbers of seed nodes m on LTPlus model with random sampling the active threshold of each node. Results on six datasets are shown in Fig. 1, where x-axis represents the number of seed nodes m and y-axis represents the influence spread $\sigma (S)$. From the results, we can see that our method outperforms community-based method (CoFIM) and heuristic-based methods (HGD, NCVoteRank K-Shell) on all datasets. Besides, our proposed method surpasses CELF on Pubmed dataset in some scenarios. CELF and IMM have similar performance in influence spread on six datasets. On the four small datasets(Fig. 1b–e), our method has similar performance with CELF and IMM which have theoretical guarantees. However, CELF can not be executed on the Synthetic dataset since its running time is intolerable. Methods with no theoretical guarantees may perform well on some datasets, but perform poorly on other datasets. For example, NCVoteRank and CoFIM perform well on Pubmed and Synthetic but poorly on Washington. Since both topology and attribute influence are considered in the seed nodes selection process of Ours, our method is more stable than other methods without theoretical guarantees. Overall, from the influence spread results on six datasets, our proposed algorithm shows its effectiveness and robustness in finding influential seed nodes and achieving influence maximization.

Since Independent Cascade (IC) model is also a classic propagation model, experiments are carried out on the IC model to evaluate the performance of the proposed method. In the IC model, a uniform probability p is assigned to each edge of the graph. A node $v_i$ has a chance of p to activate its out-neighbors. The probability p in our experiments is set as 0.1 by following the previous study⁵ and the number of seeds m ranges from 5 to 50. From Fig. 2, we can see that our proposed method still has a good performance in most cases. In addition, our node selection method does not depend on the propagation model, we do not need to re-select seeds when the propagation model changes. This proves the universality of our method.

The speedup experiments based on the LTPlus and the IC model are shown in Tables 3 and 4, respectively. Three different number of seeds 30, 40 and 50 are taken for experiments. Table 3 reveals that the proposed method has positive speedup than CoFIM, HGD, NCVoteRank and K-Shell on all datasets. Besides, the proposed method has positive speedup than CELF and IMM on Pubmed and Washington datasets. Although the proposed method has negative speedup than CELF and IMM on Cornell and Texas datasets, the absolute value of the speedup is very small, which means the difference of influence spread between these two methods is small. In Table 4, the proposed method has positive speedup than baseline methods in almost all datasets. The experimental results show the effectiveness of our proposed method.

In the seed nodes selection phase, we propose to recalculate the current influence of seed nodes’ in-neighbors (as shown in the 12th row of Algorithm 1) to reduce the propagation overlap between seed nodes selected from the same community. To verify the effectiveness of this step, we compare the influence spread of our proposed algorithm with/without recalulating INF of seed node’s in-neighbors, respectively. As shown in Table 2, the first row of each dataset is the influence spread of Ours method on the LTPLus model, and the second row of each dataset is the influence spread of our proposed method without recalculating INF of seed nodes’ in-neighbors in seed nodes selection phase, that is, without the 12th row in Algorithm 1. Compared to the method without recalculating INF in seed nodes selection phase, the influence spread of Ours method has an improvement to some extent. Especially in Washington network when $m=5$, the value of the first row is significantly higher than the second row. This may be due to that nodes in the network are concentrated in the same community and the number of initial seed nodes is small. Most seed nodes are selected from the same community and they may connect with each other. Seed nodes have a large number of common neighbors which eventually lead to a small influence spread. Therefore, it is necessary to recalculate the influence of seed nodes’ in-neighbors in the seed nodes selection process.

Table 2 Ablation experiments that analyze the impact of recalculating the INF of seed nodes’ in-neighbors.

Full size table

Table 3 Speedup % (in terms of influence spread) for Ours versus other baseline methods on six datasets. The propagation is simulated on the LTPlus model.

Full size table

Table 4 Speedup % (in terms of influence spread) for ours versus other baseline methods on six datasets.

Full size table

Time efficiency is a key indicator that many researchers concern about. Therefore, the running time of our proposed algorithm and baselines algorithms are analyzed in stages. Experiments are carried out on a computer with 2.30 GHz Intel i7-10875H CPU and 32GB memory. Table shows the running time of various algorithms on six datasets. Here the running time of seed nodes selection is the time of selecting 25 seed nodes. As can be seen from this table, the time efficiency of our proposed method is very competitive in seed nodes selection phase. Although CELF has a good performance in influence spread, its running time is too long. IMM shows high time efficiency in all datasets. However, both CELF and IMM select seeds depend on the propagation model. They should reselect seeds when the propagation model changed. CoFIM has a relative high time efficiency in the seed nodes selection process in large-scale datasets. The running time of K-Shell is low, but its influence spread is unsatisfactory. HGD and NCVoteRank show high time efficiency in some datasets but sometimes it is inefficient and their influence spread performance is also not stable.

Table 5 Running time (in seconds) for different algorithms on six datasets.

Full size table

Besides, except for the time of seed nodes selection phase, the community detection time of Ours and CoFIM is also analyzed. Compared with CoFIM, the graph-embedding based community detection method used in Ours requires more time to find proper communities. Although the community detection phase seems to be time-consuming, it only needs to be carried out once for each dataset, no matter how many groups of experiments are carried out on one dataset. The time of calculation attribute similarities in CELF and Ours under the LTPlus model is reported. Similarly, the time of calculation K-Shell values in HGD and NCVoteRank is also reported. It should be noted that attribute similarities and K-Shell values are computed and saved in advance for the convenience of multiple experiments. That is, they are only executed one time for each dataset.

Discussion

In summary, we propose an extension of LT information propagation model, named LTPlus, that considers topologies and attributes of nodes in propagation simulations. This model is more suitable than previous information propagation models in attributed networks. In addition, we propose a novel community-based method to identify a set of vital nodes to achieve influence maximization in attributed networks. To the best of our knowledge, the proposed method makes the first effort to combine influence maximization with the graph-embedding community detection method. Compared with well-known state-of-the-art methods, empirical analyses on five real world networks and a large scale synthetic network under the LTPlus model suggest that our proposed method always performs very competitively, as shown in Fig. 1. Experimental results in Fig. 2 show the universality of our proposed method under the IC model. We believe our work can bring a little light into studies of the influence maximization problem in the future. For example, the graph-embedding community detection method can be further improved for directed attributed networks. In addition, an end-to-end method considering the property of propagation models can be further explored in the future work.

Data availability

All relevant real world datasets can be downloaded from https://github.com/yingwang926/attributed_datasets.

References

Chunaev, P. Community detection in node-attributed social networks: A survey. Comput. Sci. Rev. 37, 100286. https://doi.org/10.1016/j.cosrev.2020.100286 (2020).
Article MathSciNet MATH Google Scholar
Chen, W., Wang, C. & Wang, Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1029–1038. https://doi.org/10.1145/1835804.1835934 (2010).
Li, Z. & Huang, X. Identifying influential spreaders by gravity model considering multi-characteristics of nodes. Sci. Rep. 12, 9879. https://doi.org/10.1038/s41598-022-14005-3 (2022).
Article ADS CAS Google Scholar
Vega-Oliveros, D. A., da Fontoura Costa, L. & Rodrigues, F. A. Influence maximization by rumor spreading on correlated networks through community identification. Commun. Nonlinear Sci. Numer. Simul. 83, 105094. https://doi.org/10.1016/j.cnsns.2019.105094 (2020).
Article MathSciNet MATH Google Scholar
Kempe, D., Kleinberg, J. & Tardos, É. Maximizing the spread of influence through a social network. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 137–146. https://doi.org/10.1145/956750.956769 (2003).
Leskovec, J. et al. Cost-effective outbreak detection in networks. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 420–429. https://doi.org/10.1145/1281192.1281239 (2007).
Goyal, A., Lu, W. & Lakshmanan, L. V. Celf++ optimizing the greedy algorithm for influence maximization in social networks. in International Conference Companion on World Wide Web. 47–48. https://doi.org/10.1145/1963192.1963217 (2011).
Liu, D., Jing, Y., Zhao, J., Wang, W. & Song, G. A fast and efficient algorithm for mining top-k nodes in complex networks. Sci. Rep. 7, 43330. https://doi.org/10.1038/srep43330 (2017).
Article ADS Google Scholar
Kitsak, M. et al. Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893. https://doi.org/10.1038/nphys1746 (2010).
Article CAS Google Scholar
Ullah, A. et al. Identification of nodes influence based on global structure model in complex networks. Sci. Rep. 11, 6173. https://doi.org/10.1038/s41598-021-84684-x (2021).
Article ADS CAS Google Scholar
Yang, P.-L., Xu, G.-Q., Yu, Q. & Guo, J.-W. An adaptive heuristic clustering algorithm for influence maximization in complex networks. Chaos Interdiscip. J. Nonlinear Sci. 30, 093106. https://doi.org/10.1063/1.5140646 (2020).
Wang, B., Zhang, J., Dai, J. & Sheng, J. Influential nodes identification using network local structural properties. Sci. Rep. 12, 1833. https://doi.org/10.1038/s41598-022-05564-6 (2022).
Article ADS CAS Google Scholar
Samir, A. M., Rady, S. & Gharib, T. F. Lkg: A fast scalable community-based approach for influence maximization problem in social networks. Physica A Stat. Mech. Appl. 582, 126258. https://doi.org/10.1016/j.physa.2021.126258 (2021).
Article Google Scholar
Chen, Y.-C., Zhu, W.-Y., Peng, W.-C., Lee, W.-C. & Lee, S.-Y. Cim: Community-based influence maximization in social networks. ACM Trans. Intell. Syst. Technol. 5, 1–31. https://doi.org/10.1145/2532549 (2014).
Article Google Scholar
Bozorgi, A., Samet, S., Kwisthout, J. & Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 134, 149–158. https://doi.org/10.1016/j.knosys.2017.07.029 (2017).
Article Google Scholar
Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113. https://doi.org/10.1103/PhysRevE.69.026113 (2004).
Article ADS CAS Google Scholar
Luo, D. et al. Local community detection in multiple networks. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 266–274. https://doi.org/10.1145/3394486.3403069 (2020).
Xu, H. Gromov–Wasserstein factorization models for graph clustering. AAAI Conf. Artif. Intell. 34, 6478–6485 (2020).
Google Scholar
Wang, T., Chen, S., Wang, X. & Wang, J. Label propagation algorithm based on node importance. Physica A Stat. Mech. Appl. 551, 124137. https://doi.org/10.1016/j.physa.2020.124137 (2020).
Article Google Scholar
Garza, S. E. & Schaeffer, S. E. Community detection with the label propagation algorithm: A survey. Physica A Stat. Mech. Appl. 534, 122058. https://doi.org/10.1016/j.physa.2019.122058 (2019).
Article MathSciNet MATH Google Scholar
Morone, F. & Makse, H. A. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68. https://doi.org/10.1038/nature14604 (2015).
Article ADS CAS Google Scholar
Yan, Y., Bian, Y., Luo, D., Lee, D. & Zhang, X. Constrained local graph clustering by colored random walk. in The World Wide Web Conference. 2137–2146. https://doi.org/10.1145/3308558.3313719 (2019).
Torghabeh, R. P. & Santhanam, N. P. Modeling community detection using slow mixing random walks. in IEEE International Conference on Big Data. 2205–2211. https://doi.org/10.1109/BigData.2015.7364008 (2015).
Alinezhad, E., Teimourpour, B., Sepehri, M. M. & Kargari, M. Community detection in attributed networks considering both structural and attribute similarities: Two mathematical programming approaches. Neural Comput. Appl. 32, 3203–3220. https://doi.org/10.1007/s00521-019-04064-5 (2020).
Article Google Scholar
Bandyopadhyay, S., Lokesh, N. & Murty, M. N. Outlier aware network embedding for attributed networks. AAAI Conf. Artif. Intell. 33, 12–19 (2019).
Google Scholar
Liu, F. et al. Deep learning for community detection: Progress, challenges and opportunities. in International Joint Conference on Artificial Intelligence. 4981–4987. https://doi.org/10.24963/ijcai.2020/693 (2020).
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2, 113–120. https://doi.org/10.1080/0022250X.1972.9989806 (1972).
Article Google Scholar
Freeman, L. C. A set of measures of centrality based on betweenness. Sociometry. 35–41. https://doi.org/10.2307/3033543 (1977).
Bavelas, A. Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22, 725–730. https://doi.org/10.1121/1.1906679 (1950).
Article ADS Google Scholar
Li, Z. & Huang, X. Identifying influential spreaders in complex networks by an improved gravity model. Sci. Rep. 11, 22194. https://doi.org/10.1038/s41598-021-01218-1 (2021).
Article ADS CAS Google Scholar
Zhang, J.-X., Chen, D.-B., Dong, Q. & Zhao, Z.-D. Identifying a set of influential spreaders in complex networks. Sci. Rep. 6, 27823. https://doi.org/10.1038/srep27823 (2016).
Article ADS CAS Google Scholar
Kumar, S. & Panda, B. Identifying influential nodes in social networks: Neighborhood coreness based voting approach. Physica A Stat. Mech. Appl. 553, 124215. https://doi.org/10.1016/j.physa.2020.124215 (2020).
Article Google Scholar
Bouyer, A. & Beni, H. A. Influence maximization problem by leveraging the local traveling and node labeling method for discovering most influential nodes in social networks. Physica A Stat. Mech. Appl. 592, 126841. https://doi.org/10.1016/j.physa.2021.126841 (2022).
Article Google Scholar
Jiang, L., Zhao, X., Ge, B., Xiao, W. & Ruan, Y. An efficient algorithm for mining a set of influential spreaders in complex networks. Physica A: Stat. Mech. Appl. 516, 58–65. https://doi.org/10.1016/j.physa.2018.10.011 (2019).
Article ADS Google Scholar
Bouyer, A. & Roghani, H. Lsmd: A fast and robust local community detection starting from low degree nodes in social networks. Future Gener. Comput. Syst. 113, 41–57. https://doi.org/10.1016/j.future.2020.07.011 (2020).
Article Google Scholar
Li, J. et al. Community-diversified influence maximization in social networks. Inf. Syst. 92, 101522. https://doi.org/10.1016/j.is.2020.101522 (2020).
Article Google Scholar
Cao, T., Wu, X., Wang, S. & Hu, X. Oasnet: An optimal allocation approach to influence maximization in modular social networks. in ACM Symposium on Applied Computing. 1088–1094. https://doi.org/10.1145/1774088.1774314 (2010).
Shang, J., Zhou, S., Li, X., Liu, L. & Wu, H. Cofim: A community-based framework for influence maximization on large-scale networks. Knowl.-Based Syst. 117, 88–100. https://doi.org/10.1016/j.knosys.2016.09.029 (2017).
Article Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008. https://doi.org/10.1088/1742-5468/2008/10/p10008 (2008).
Article MATH Google Scholar
Li, Y., Zhang, D. & Tan, K.-L. Real-time targeted influence maximization for online advertisements. in International Conference on Very Large Data Bases Endowment. Vol. 8. 1070–1081. https://doi.org/10.14778/2794367.2794376 (2015).
Cai, T. et al. Target-aware holistic influence maximization in spatial social networks. IEEE Trans. Knowl. Data Eng. 34, 1993–2007. https://doi.org/10.1109/TKDE.2020.3003047 (2020).
Article Google Scholar
Yu, E.-Y., Fu, Y., Chen, X., Xie, M. & Chen, D.-B. Identifying critical nodes in temporal networks by network embedding. Sci. Rep. 10, 12494. https://doi.org/10.1038/s41598-020-69379-z (2020).
Article ADS CAS Google Scholar
Huang, X., Li, J. & Hu, X. Accelerated attributed network embedding. in SIAM International Conference on Data Mining. 633–641. https://doi.org/10.1137/1.9781611974973.71 (SIAM, 2017).
He, D. et al. Community-centric graph convolutional network for unsupervised community detection. in International Joint Conference on Artificial Intelligence. 551–556. https://doi.org/10.24963/ijcai.2020/486 (2020).
Zhang, X., Liu, H., Li, Q. & Wu, X.-M. Attributed graph clustering via adaptive graph convolution. in International Joint Conference on Artificial Intelligence. 4327–4333. https://doi.org/10.24963/ijcai.2019/601 (2019).
Goldenberg, J., Libai, B. & Muller, E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Market. Lett. 12, 211–223. https://doi.org/10.1023/A:1011122126881 (2001).
Article Google Scholar
Domingos, P. & Richardson, M. Mining the network value of customers. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 57–66. https://doi.org/10.1145/502512.502525 (2001).
Azaouzi, M., Mnasri, W. & Romdhane, L. B. New trends in influence maximization models. Comput. Sci. Rev. 40, 100393. https://doi.org/10.1016/j.cosrev.2021.100393 (2021).
Article MATH Google Scholar
Li, Y., Fan, J., Wang, Y. & Tan, K.-L. Influence maximization on social graphs: A survey. IEEE Trans. Knowl. Data Eng. 30, 1852–1872. https://doi.org/10.1109/TKDE.2018.2807843 (2018).
Article Google Scholar
Tang, Y., Shi, Y. & Xiao, X. Influence maximization in near-linear time: A martingale approach. in ACM SIGMOD International Conference on Management of Data. 1539–1554. https://doi.org/10.1145/2723372.2723734 (2015).
Borgs, C., Brautbar, M., Chayes, J. & Lucier, B. Maximizing social influence in nearly optimal time. in Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms. 946–957. https://doi.org/10.1137/1.9781611973402.70 (SIAM, 2014).
Han, J., Kamber, M. & Pei, J. 2—Getting to know your data. in Data Mining (Third Edition) (Han, J., Kamber, M. & Pei, J. eds.). 39–82. https://doi.org/10.1016/B978-0-12-381479-1.00002-2 (2012).
Goodfellow, I., Bengio, Y. & Courville, A. 6.2.2.3 Softmax units for multinoulli output distributions. in Deep Learning. 180–184 (2016).
Kamhoua, B. F. et al. Grace: A general graph convolution framework for attributed graph clustering. ACM J. ACM (JACM). 1–30. https://doi.org/10.1145/3544977 (2022).
Bouyer, A., Beni, H. A., Arasteh, B., Aghaee, Z. & Ghanbarzadeh, R. Fip: A fast overlapping community-based influence maximization algorithm using probability coefficient of global diffusion in social networks. Exp. Syst. Appl. 213, 118869. https://doi.org/10.1016/j.eswa.2022.118869 (2023).
Article Google Scholar

Download references

Acknowledgements

This work is supported by NSFC under grants 61860206007 and U19A2071, as well as the funding from Sichuan University under grant 2020SCUNG205.

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, Sichuan, China
Ying Wang, Yunan Zheng & Yiguang Liu

Authors

Ying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yiguang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.W.: Writing-original draft, Methodology. Y.Z.: Writing-review & editing. Y.L.: Supervision, Funding acquisition, Writing-review & editing.

Corresponding author

Correspondence to Yunan Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Y., Zheng, Y. & Liu, Y. Identifying vital nodes for influence maximization in attributed networks. Sci Rep 12, 22630 (2022). https://doi.org/10.1038/s41598-022-27145-3

Download citation

Received: 26 September 2022
Accepted: 27 December 2022
Published: 31 December 2022
DOI: https://doi.org/10.1038/s41598-022-27145-3

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Identifying vital nodes in complex networks by adjacency information entropy

Characterizing the interactions between classical and community-aware centrality measures in complex networks

Integrating local and global information to identify influential nodes in complex networks

Introduction

Related work

Preliminaries

Attributed networks

Linear threshold (LT) model

Independent cascade (IC) model

Influence maximization

Well-known state-of-the-art methods

Methods

The proposed LTPlus propagation model

The graph-embedding based community detection method

The seed nodes selection method

Complexity analysis

Results

Data description

Performance metrics

Experimental results

Discussion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links