Iterative Neighbour-Information Gathering for Ranking Nodes in Complex Networks

Designing node influence ranking algorithms can provide insights into network dynamics, functions and structures. Increasingly evidences reveal that node’s spreading ability largely depends on its neighbours. We introduce an iterative neighbourinformation gathering (Ing) process with three parameters, including a transformation matrix, a priori information and an iteration time. The Ing process iteratively combines priori information from neighbours via the transformation matrix, and iteratively assigns an Ing score to each node to evaluate its influence. The algorithm appropriates for any types of networks, and includes some traditional centralities as special cases, such as degree, semi-local, LeaderRank. The Ing process converges in strongly connected networks with speed relying on the first two largest eigenvalues of the transformation matrix. Interestingly, the eigenvector centrality corresponds to a limit case of the algorithm. By comparing with eight renowned centralities, simulations of susceptible-infected-removed (SIR) model on real-world networks reveal that the Ing can offer more exact rankings, even without a priori information. We also observe that an optimal iteration time is always in existence to realize best characterizing of node influence. The proposed algorithms bridge the gaps among some existing measures, and may have potential applications in infectious disease control, designing of optimal information spreading strategies.


Examples of the Ing process in toy networks
To understand the Ing process, we apply it to some toy networks, as shown in Fig. S1. To illustrate how to calculate the Ing scores, for the first toy network as shown in the first panel of Fig. S1, we set L = A and c = Degree. We can use matrix format to obtain all nodes' Ing score at one time as follows: s (1) = y (1) max(y (1) If the network size is too large to be expressed as a matrix, we prefer the following format: (1) s (0) = 2 + 3 + 3 + 2 + 5 = 15, (2) s (0) = 5 + 5 = 10, (3) s (0) = 5 + 3 + 5 = 13, (4) where Γ(i) is the neighbour set of node i. The maximum is max(y (1) ) = 15, then we obtain the 1-order Ing score s (1) = y (1) /15. Similarly, higher-order Ing score can be obtained using either of the two calculation methods. The evolution of the Ing score for the network are shown in the second figure of Fig. S1, the detailed calculating process are omitted. It's noticed that the Ing score vector converges when n = 5. For the other networks, we find that self-loops, disconnection, direction and weight can not damage the convergence of the process.

Fig. 4 and Fig. 6 with error bars
Curves in Fig. 4 and Fig. 6 are averaged over 1000 independent simulation runs. The plots with error bars are shown in Fig. S2 and Fig. S3. It is reported that the conclusions still hold, even though we take the error bars into account.

Relationship of the Ing score and the traditional centralities
Some existing centralities can be viewed as special cases of the Ing process, such as the degree, semi-local centrality, the eigenvector centrality, the LeaderRank and the iterative resource allocation (IRA). The equivalence between the Ing process and the degree, semi-local centrality, the eigenvector centrality are discussed in the main text. For the LeaderRank and the IRA, the settings of these two algorithms are a little complicated. Now we first introduce the work flows of the LeaderRank and the IRA. Given a complex network G(V, E), V and E are node and edge sets respectively, |V | = v and |E| = m denotes the number of nodes and edges respectively. Its adjacency matrix is A = (a i j ), where a i j = 1 if node i points to node j and 0 otherwise. First of all, the LeaderRank adds a new node, which connects with all nodes via bidirectional edges, to make the network strongly connected. The new node is called ground node and others are called ordinary nodes. The new network's adjacency matrix is where 1 is a n × 1 vector whose elements are all 1. Then, the LeaderRank assigns a score to each node, s  (10) We can define the matrix of the Ing process as Z = (z i j ), where z i j = a ji k out j . Hence, the LeaderRank is equivalent to s(Z , r, ∞), where r is a random vector whose elements are not all zeros. Remark that, in fact, the initial Ing score is chosen randomly and it does not affect the limit state, while the original LeaderRank is set as (1 , 0) . The network is augmented via ground node, so the Ing score vector is (v + 1)-dimensional, while we only focus on the first v elements. τ Closeness Figure S3: Evolutions of correlation coefficient between spreading range and A -Ing score with four kinds of a priori information.

5/9
The IRA also iteratively assigns score to each node, while the update rule is different from the LeaderRank. Its initial scores are set as 1 and its linear transformation matrix is X = (x i j ), where θ i is the prior information of node i, often defined as degree, coreness, closeness, betweenness and so on. Hence, the IRA is equivalent to s(X , r, ∞).

The W -Ing process
When priori information are absent, the algorithm still works well. We choose linear transformation as W and priori information as random vector. From Fig. S4, we find that the W -Ing also can improve accuracy remarkably from n = 0 to 6 and obtain a pretty result with n * . To see the evolution of the W -Ing score, we select four kinds of representative priori information and draw Fig. S5. From Fig. S5, we can draw the following conclusions. τ first increases and then decrease, there always a peak value for τ. The peak value corresponds to the optimal iteration times. Moreover, τ tends to be stable when n is sufficiently large.

Definition of the Ing process on directed networks
The original Ing algorithm is designed mainly for undirected networks, while it can be generalized for directed ones. Given G(V, E) as a directed network with adjacency matrix A = (a i j ), where a i j = 1 when i points to j and 0 otherwise. Since edges are directional in directed networks, a node i has two types of neighbour, in-neighbour and out-neighbour. The first type are those who point to i and the second are those who are pointed by i. When disease spread via directional edges, only out-neighbours contribute to node i's spread range, hence, the Ing process simply collects out-neighbours' information. Denote s (n−1) as (n − 1)-order Ing score vector, we have where collection matrix A can be replaced by some other well-defined ones, for example, W = A + I. Remark that if we have adjacency matrix A = (a i j ), where a i j = 1 when j points i, Eq.(12) should be changed as y (n) = A T s (n−1) .

Datasets description for some directed networks
To verify the effectiveness of the Ing process on directed networks, we select six representative directed networks.
1. Advogato 1 is an online community platform for developers of free software launched in the year 1999. Nodes are users of Advogato and the directed edges represent trust relationships.
2. Anybeat 2 , an online community from a public gathering place where one can interact with people from its neighborhood or across the world.
3. RockLake 3 is the food web of Little Rock Lake, Wisconsin in the United States of America. Nodes in this network are autotrophs, herbivores, carnivores and decomposers; links represent food sources.
4. SpaBook 4 reflects word adjacency relationships of a Spanish book. Nodes in the network are words and an edge denotes that two words occurred one after another in the book. The network is directed, i.e., the edge (u, v) denotes that word u was followed by word v. Since a word can occur twice in a row, the network contains loops.
5. USairport 6 is a directed network of flights between US airports in the year 2010. Each edge represents a connection from one airport to another, and the weight of an edge shows the number of flights on that connection in the given direction.
6. UCsocial 5 contains sent messages between the users of an online community of students from the University of California, Irvine. A node represents a user. A directed edge represents a sent message.

Effectiveness of the Ing process in directed networks
We choose out-degree, out-Hindex, out-coreness, LeaderRank (LR), Weighted LeaderRank (WLR), ClusterRank (CR) as priori information and employ the SIR model to quantify the spreading of node influence. The Kendall τ correlation coefficients between centralities and spread range are shown in Tab. S2. We can also conclude that the Ing score outperforms the others measures.

The difference between Ing process and PageRank
The well-know Google's ranking algorithm, PageRank, has been applied to various issues. PageRank mimics the behavior of a net surfer, i.e. one would randomly open a link on current web page, and at the same time will turn to other web pages with a small probability. In detail, PageRank is iterative just like our Ing process, 8/9 Table S2: Kendall τ correlation coefficients between centralities and spread range, where k denotes out-degree, h denotes out-Hindex, k s denotes out-coreness. Each priori information corresponds to three columns, where the first column is the priori information, the second and the third columns are the Ing score at n = n * with L = A and L = W , respectively. The integers in parentheses is the corresponding optimal n * with the greatest τ.  (5) (2) where q is a parameter which can be usually set as 0.15. And initially, we set s(0) = 1. Table S3 reports the prediction accuracy of PageRank and A -Ing process. It is shown that Ing process outperforms PageRank in both undirected and directed ones.