A new model to identify node importance in complex networks based on DEMATEL method

It is still a hot research topic to identify node importance in complex networks. Recently many methods have been proposed to deal with this problem. However, most of the methods only focus on local or path information, they do not combine local and global information well. In this paper, a new model to identify node importance based on Decision-making Trial and Evaluation Laboratory (DEMATEL) is presented. DEMATEL method is based on graph theory which takes the global information into full consideration so that it can effectively identify the importance of one element in the whole complex system. Some experiments based on susceptible-infected (SI) model are used to compare the new model with other methods. The applications in three different networks illustrate the effectiveness of the new model.

method is often used to calculate the importance of an element in the entire system. Through the logical relationship between the elements in the system, DEMATEL can construct the direct relation matrix. The system elements' degree of influence and degree of being influenced can be calculated by DEMATEL through matrix operations to obtain the importance of the elements in the whole system. More importantly, DEMATEL method not only deals with direct influence but also handles the indirect influence. Indirect influence only exists when there are three or more elements. Comparing nodes to planets, assume there are three planets α , β , θ . Regard the gravity between two planets as direct influence. There is not only gravity between planet α and planet β and gravity between planet β and planet θ but also the transitive gravity between planet α and planet θ . The transitive gravity is called the indirect influence. Likewise, indirect influence also exists in networks. In this paper, a new model to identify node importance based on DEMATEL method is proposed. The interaction between two elements calculated by Gravity model is taken as the direct influence. That is, the larger the node's degree and the shorter the distance between the nodes, the intensity of the interaction is larger. Based on DEMATEL method, the global information is fully considered.
The paper is organized as follows. "Preliminaries" section is the brief introduction of the preliminaries, reviewing some node centrality measures such as DC and BC. And the computing steps of DEMATEL method is also presented. The new model based on DEMATEL method is introduced in ″Proposed method" section. In ″Applications in real networks" section, the new model is applied in three real networks to test the effectiveness. In ″Conclusion" section, the conclusion of the whole work is presented.

Preliminaries
Centrality measures. A concrete network can be abstracted as G = (V , E) , where V is set of nodes and E denotes set of edges. Some node centrality measures such as DC and BC are introduced as follows: Definition 1 degree centrality (DC). The centrality of node i denoted as d i , can be obtained as 8 : Where N is the number of nodes in network and x ij represents the connection between node i and node j. If node i and node j is connected, then x ij equals to 1. Otherwise, x ij equals to 0. Definition 2 betweenness centrality (BC) The centrality of node i denoted as b i , can be calculated as 10,11 : Where N is the number of nodes and p mn is the number of shortest paths between node m and node n. p mn (i) represents the times that node i locates at the shortest paths between node m and node n.

Definition 3 closeness centrality (CC).
The centrality of node i denoted as CC(i), is defined as 10 : x ij is the connection between node i and node j and d ij is the distance between i and node j.

Definition 4 Eigenvector centrality (EC).
The centrality of node i denoted as EC(i), is defined as follow 12 : Given a N × N matrix M, max is the largest eigenvalue of the M. M is the adjacency matrix of network G. e i is the ith element of the normalized largest eigenvector. Definition 5 Gravity model The centrality of node i denoted as CG ( i) , is defined as follow 18 : where k i is the degree of node i and d ij is the distance between node i and node j. It referred to the formula for calculating gravity, in which mass is replaced by the degree. The Gravity model considers both local and path information. As the square of the distance increases,the gravity between the two celestial bodies is linearly attenuated. e i is the ith element of the normalized eigenvector which is taken as the weight of each nodes.
DEMATEL method. DEMATEL method 23,24 which is based on graph theory is proposed by the Bottele Institute in USA. DEMATEL method is usually applied to calculate the importance of an element in the entire system. The main advantage of DEMATEL is that it not only considers direct influence but also considers indirect influence. The computing steps of DEMATEL is shown as follows: Step 1 Construct the direct relation matrix: The influencing factors of the system need to be determined in advance. d ij represents the direct influence between i and j.
Step 2 Normalize the direct relation matrix where Step 3 Obtain total relation matrix based on normalized relation matrix. Total relation matrix T is obtained as follows: This process is the most significant part of DEMATEL method. The normalized relation matrix continues to multiply in order to add indirect influence between all elements.
Step 4 Obtain the causal parameters R and C based on total relation matrix R is used to represent the influence of each element and C is used to represent the degree of being influenced.
Step 5 Calculate the importance

Proposed method
A wide variety of node centrality measurements have been proposed these years. Each has its own advantages and disadvantages. Whether the local and global information are fully utilized is the key point to judge the effectiveness of the method. The Gravity model considers both local and path information. In this article, the interaction between two nodes calculated by the Gravity model is taken as the direct influence. Based on DEMATEL method, the indirect influence in the network is also well addressed, making the new model more globally than the Gravity model. The specific steps of the new model is shown as follows: Step 1 Construct the network Abstract complex systems in the real world into network structure as G = (V , E) . V is the set of nodes and E is the set of edges.
Step 2 Calculate the degree and the distance between nodes. In Gravity model, node degree represents local information and distance represents path information. The degree can be obtained as follows: www.nature.com/scientificreports/ where x ij is the connection between node i and node j. The distance is the length of the shortest paths between nodes.
Step 3 Generate the direct relation matrix g ij can be calculated as follows: Step 4 Normalize the direct relation matrix where Step 5 Calculate the total relation matrix. Given the normalized matrix, the total relation matrix can be obtained as follows: where The normalized relation matrix continues to multiply in order to add indirect influence between all nodes in complex networks.
Step 6 Calculate the causal parameters R and C based on total relation matrix Step 7 Calculate node importance a Add the influence of this one node on other nodes and the degree of being influenced of this node as the importance of this node in the whole network. The flow chart of the new model is illustrated in Fig. 1.
The new model is improved based on the gravity model from a more global perspective. The novelty is that the new model considers indirect influence among nodes based on DEMATEL method and takes it as global information to enhance the effectiveness. The new proposed method weighted gravity model is also improved based gravity model from a global perspective, but the global information is gained from the eigenvector of the adjacency matrix. The ideas of the two methods are completely different. The weighted gravity model will face the problem that networks with delocalized principal eigenvector will cause difficulties in assigning centrality weights to the nodes. However, the new model dose not has this problem. Jazz, NS and USAir. Jazz describes the network of Jazz musicians in USA. The node is the musician and the edge is the connection between two Jazz musicians. If the two musicians have ever played jazz together, then there is an edge between the two nodes. NS is the network of scientists in USA. The edge is the cooperation between two scientists. USAir describes American Airlines network diagram. The data can be downloaded from http:// pajek. imfm. si/ doku. php? id= data: pajek: vlado & s[]= air. The node is the airport and the edge is the air route between two airports. The basic characteristics of the networks are show in Table 1: n denotes the number of nodes. m is the the number edges. k and d are average degree and average distance, respectively. The network diagrams are shown in Fig. 2. In addition, a certain analysis of degree distribution of each network is applied. The analysis includes two parts-the degree distribution of each node in the network and the degree probability distribution of the network which is shown in Figs. 3, 4 and 5.
Centrality scores calculated by different methods. In this experiment, the new model is applied to calculate the top 10 important nodes in four different networks. As a comparison, the centrality measurements mentioned before such as BC, EC are used in the same networks. Nodes are arranged in descending order of their node degree. In this way, the top 10 nodes selected represent the nodes with high influence in the network. The results are shown in Tables 2, 3             Compare the effectiveness based on SI model. In SI model, the more nodes an infected node infects in the same time, the more important the node is. In this experiment, the number of average infected nodes is used to test the effectiveness of the new model. First, the nodes in the network are sorted by centrality calculated by the proposed method. Then we take the top 50 nodes as source of infection. Each node will spread n times with a probability of β within experimental simulation time. First, we record the total number of nodes infected by each node. Then, we divide it by the number of experiments to get each node's average infected nodes. In addition we add the average infected nodes of the top 50 nodes. In this experiment, probability of node infection was set to 0.  www.nature.com/scientificreports/ The abscissa of the figure is time and the ordinate is the average infected nodes. The larger the slope of the curve, the greater influence the selected nodes have. In network Jazz, number of average infected nodes increases with time and eventually stabilizes. The growth rate decreases with time, and finally approaches to 0. The new model performs slightly better than Gravity model and much better than weighted gravity model. Because the number of average infected nodes of the new model is larger than that of Gravity model and weighted gravity model until it goes steadily which means that the top 50 nodes calculated by the new model are more important in the network. In network NS, the growth rate increase with time. Obviously, the new model is superior to Gravity model and weighted gravity model. In network USAir97, the new model is slightly worse than Gravity model, but much better than weighted gravity model. In general, the new model consist with Gravity model but it is obviously superior to weighted gravity model. The superiority of the new model can be demonstrated in this experiment.
Compare the correlation with SI model. Before the experiment, Kendall's tau 28 is introduced. Kendall's tau is used to measure the consistency of two equal-length sequences. The more consistent the two sequences, the greater the value of tau. Assume there are two sequences X and Y of length n. x i and y i are the ith elements of X and Y respectively. Given two sequence pairs (x i , y i ) and (x j , y j ) , if x i > x j and y i > y j or x i < x j and y i < y j , the pairs are regarded as positive. Otherwise, they are regarded as negative.
The Kendall's Tau is defined as: These graphs describe the difference between the node sequence based on SI model and the sequence based on node centrality as β increases. The greater the value of tau, the stronger the correlation between the two sequences. In network Jazz, the new model performs better when β is in the interval 0.55-0.8. Gravity model shows the worst performance. In network NS, the performance of the new model is better in the early stage, and the performance of the three methods is close to the same in the later stage. In network USAir97, when β > 0.5, the tau of the new model is below 0 which means the number of negative pairs is larger than the number of positive pairs.

Conclusion
Recently, how to identify node importance in networks has become a hot issue. Whether the local and global are fully combined is the key point to test the effectiveness of the method. This article proposed a new model based on DEMATEL method. The interaction between two nodes obtained by Gravity model is taken as the direct influence. Based on DEMATEL method, the indirect influence is fully considered making the model more globally than Gravity model. The limitation is that although each experiment is repeated many times, the results obtained by the experiments based on SI model will still have some randomness. In general, the results demonstrate the superiority of our method.