Extracting h-Backbone as a Core Structure in Weighted Networks

Determining the core structure of complex network systems allows us to simplify them. Using h-bridge and h-strength measurements in a weighted network, we extract the h-backbone core structure. We find that focusing on the h-backbone in a network allows greater simplification because it has fewer edges and thus fewer adjacent nodes. We examine three practical applications: the co-citation network in an information system, the open flight network in a social system, and coauthorship in network science publications.

The contemporary study of complex networks began with Watts & Strogatz 1 and Barabási & Albert 2 , and the resulting complex network science is now widely used in research on social, information, biological, and technological networks [3][4][5][6][7][8][9] . Although extracting the network backbone is an important task in network analysis [10][11][12] , it is difficult to extract the interactions between nodes or edges and the unique core structure. The numerous attempts to extract the backbone of a complex network have used different values-e.g., the degree distribution or the edge-betweenness centrality distribution 13 -in an effort to preserve backbone information. Other approaches have focused on network type-e.g., economic systems 14 or online recommendation networks 15 . Another key issue is that backbones are not unique, and some parameters need an artificial setting.
Using the h-index 16 metric, which is now commonly used in recommendation networks and its other network applications 17 , we introduced h-degree and h-strength and extracted the h-core and h-subnet of a weighted network [18][19][20] . Although in this work we were able to use h-degree and h-strength factors to extract functionally significant core information, we note that both h-factors overlook nodes and edges that have a relatively low weight-the very network nodes and edges often vital in transporting the flow of information. Also, according to the weak tie theory 21,22 , we notice that some weak links can be structurally important in networks.
To quantify the importance of each node and edge in a given network, since the 1970s, different types of centralities have been defined [23][24][25][26] . When extracting important network information, ranking edge centrality is more effective than ranking node centrality. This is because nodes can exist in isolation, but edges always connect two nodes. Edge weights are naturally generated in a network, better represent interaction levels between nodes, and thus provide an index that quantifies the importance of network functions. At the same time, edge betweenness reveals the structural characteristics of a network. In some of the literature 13,27 edge betweenness is used to extract the structural skeleton of a network. Thus combining edge weight and edge betweenness can provide important information about both network function and structure.
In our research we combine the h-bridge and h-strength to capture the structurally important interactions of edges with adjacent nodes. After extracting the structural h-bridge and the functional h-strength in a weighted network, we synthesize an h-backbone that combines both structural and functional interactions.

Data
We use three sets of data in our research.
(1) Co-citation network: From the ISI Web of Science (WoS) on 18 May 2017 we obtained the top 100 most-cited articles that cited Hirsch's original paper that defined the h-index ("An index to quantify an individual's scientific research output"). We examined the references that occurred more than five times, set up a co-citation network, and then deleted Hirsch's original paper. Allowing it to remain would have affected the edge betweenness because it was connected to all the other references. We assign the network weights as described in Newman's work 26 . (4) These three data sets represent two typical networks. The first and the last are information networks, and the second a social (transportation) network. Table 1 shows the main features of these weighted networks.

Results
We run experiments to test our method of identifying the h-backbone in a weighted network. Figure 1 shows the procedure for identifying the h-backbone in a co-citation network. The left side shows the original network and the right its h-backbone. Figure 1 shows both highly-cited papers, such as Egghe's paper in 2006 and Ball's in 2005, and bridge papers that connect related research topics, such as Brin's article in Computer Networks & ISDN Systems that provides a foundation for many other articles that combine later web search engine design and h-index research. Table 2 provides structural information and lists all of the nodes in the h-backbone that form the core of the weighted network. The percentages of edges and nodes in the h-backbone of the co-citation network vs. the total are 0.08% and 2.47%, respectively. Figure 2 shows the h-backbone of the open flight network. On the left side is the image of the original network and on the right is its h-backbone.
In the original open flight network, a node is an airport labeled by its IATA code. To clarify the information, we add the name of the city to the IATA code.
Using the h-backbone network we identify the airports that structurally and functionally are most important, e.g., "Chicago-ORD, " which is one of the world's biggest passenger airports, and "Anchorage-ANC, " which is one of the world's busiest cargo airports. We evaluate airport performance in terms of passengers, cargo (freight and mail), and aircraft movement. Table 3  Here the airport importance is determined by combining its business in cargo and passengers and its movements. Thus the h-backbone quantifies its importance. Figure 3 shows the h-backbone of coauthorship in network science publishing. On the left side is an image of the original network 24 in which only the largest component of the resulting network is shown. On the right is the h-backbone of the entire network. The blue triangles on the left are the nodes in the h-backbone. Note that these h-backbone nodes are important in the original network. The percentages of edges and nodes in the h-backbone vs. the total are 0.9% and 0.5%, respectively.  These three cases show that we can identify an h-backbone in a weighted network, and that with fewer than 1% edges and 3% nodes the h-backbone is a core structure in the weighted network. This approach effectively locates and extracts the structurally and functionally important edges with adjacent nodes in weighted networks.

Discussion
Unlike that found in other backbone approaches [10][11][12] , the structure of the h-backbone is unique in each network. In the Serrano approach, because the adjacent edges in some nodes are assumed to be more significant, they are assigned to the backbone. This "significance" is determined using a "disparity filter" with a variable α that strongly affects how many edges or nodes remain in the backbone. In the h-backbone algorithm, the number of edges remaining in the h-backbone is determined solely by network characteristics, i.e., edge weight (h-strength) and network structure (h-bridge). In addition, the h-backbone algorithm is highly efficient, and it preserves the small number of edges and nodes that carry important information. In addition, because the h-backbone focuses on edges rather than nodes, it retains more structural characteristics. As a result, there are no isolated nodes in the h-backbone, and every node is connected to at least one other node. Figure 4 shows a comparative example. Table 4 shows a computed numerical comparison of the h-backbone and the Serrano backbone in three real-world networks.
In Table 4, the number represents the amount of nodes or edges corresponding to the network. The number in parentheses stands for the percentage of nodes or edges overlapped by the h-backbone, which is the value of the number of nodes or edges both in Serrano backbone and h-backbone divided by the number of nodes or edges in Serrano backbone.
Note that the Serrano backbone requires the artificial parameter α. When this parameter changes, the number of network nodes and edges changes drastically. When α = 0.01, the similarity between the two backbones exceeds 30%, and in one case there is a complete 100% overlap (the co-citation network). When α = 0.05, the similarity is less, in part because the number of edges preserved by the h-backbone is smaller than those by the Serrano backbone.
Unlike those in the current literature, the h-backbone needs no parameter to adjust the size of the resulting backbone, and thus the h-backbone of each network is uniquely determined. Using the h-backbone method eliminates artificial interference in the process of backbone extraction.
Both the connected and unconnected h-backbones are determined by the original structure of the network. In our examples, the h-backbone of the co-citation network is connected and the h-backbone of open flight network is unconnected.
In general, if we assume that the h-backbone has m edges and n nodes, with the h-bridge and h-strength of h b and h s respectively, the number of edges in the h-backbone will be fewer than or equal to h b + h s , and the number of nodes in the h-backbone will be fewer than or equal to 2(h b + h s ). Because one edge links two nodes, m < n. Thus The structure of h-backbones varies from network to network, and because of this complexity we have not attempted to provide a mathematical proof for the h-backbone, which limits our efforts, but recent research 28 has demonstrated the relation between the h-index and the coreness. The h-backbone combines the structural  importance of the h-bridge with the functional importance of the h-strength, and thus it retains both structural and functional core interactions.

Conclusion
We have introduced a method of finding the h-backbone, which is a core structure in weighted networks. This core network structure of edges and adjacent nodes is important both structurally and functionally, and our method can be used to simplify complex weighted networks. Because the h-backbone integrates core edges with adjacent nodes, the important information of the weighted network is retained. Unlike previous backbones, the h-backbone is a unique core network structure.
The h-backbone methodology can be generalized to other weighted networks. Currently, our case study addresses only undirected weighted information networks, leaving directed weighted and heterogeneous and multilayer weighted networks 29 for future research. Dynamic issues are also left for future study.

Method
A network (graph) consists of nodes (vertices) and edges (links) 30,31 . When nodes and edges represent information-related and society-related objects, we designate the two systems information and social networks, respectively.
Theoretically, betweenness centrality is a measure of centrality in a graph based on shortest paths. There are node betweenness and edge betweenness, and we focus on edge betweenness because its centrality quantifies the number of times an edge acts as a bridge in the shortest path between two nodes. Introduced by Linton Freeman 27 , the betweenness centrality of a node is the number of these shortest paths that pass through it. The edge betweenness of an edge can be similarly defined 28 .
In a given network, the edge betweenness of an edge v in a network G = (V,E) is defined where σ st is the total number of shortest paths from node s to node t and σ st (v) is the number of those paths that pass through edge v.
Edge betweenness quantifies the structural importance of a network edge. The edge with a higher edge betweenness often acts as a bridge to transmit information. Note, by definition, in a network of N nodes, the maximum edge betweenness of a given edge is N × (N-1), i.e., the greater the number of nodes in a network, the larger the edge betweenness of most of the edges. Thus we introduce a new measurement, the bridge, which we obtain by dividing the edge betweenness with the number of all nodes N,    After we calculate the bridge for all edges, we rank them using an h-index approach. Because the h-bridge quantifies the structurally important edges connecting the network, and the h-strength characterizes the core edges of a network in terms of link strengths, we can obtain the core backbone structure by combining them.

Definition 3. The h-backbone.
An h-backbone of a network is a core sub-network consisting of all edges with strengths larger than or equal to the h-bridge or the h-strength in the network, together with their adjacent nodes.
In a weighted network the algorithm for extracting the h-backbone has three steps (Fig. 5).
Step 1: Find the edges with a bridge higher than or equal to the h-bridge; Step 2: Find the edges with a weight higher than or equal to the h-strength; Step 3: Identify the h-backbone by merging the edges of Step 1 and 2 and adding their adjacent nodes.