AST: Activity-Security-Trust driven modeling of time varying networks

Network modeling is a flexible mathematical structure that enables to identify statistical regularities and structural principles hidden in complex systems. The majority of recent driving forces in modeling complex networks are originated from activity, in which an activity potential of a time invariant function is introduced to identify agents’ interactions and to construct an activity-driven model. However, the new-emerging network evolutions are already deeply coupled with not only the explicit factors (e.g. activity) but also the implicit considerations (e.g. security and trust), so more intrinsic driving forces behind should be integrated into the modeling of time varying networks. The agents undoubtedly seek to build a time-dependent trade-off among activity, security, and trust in generating a new connection to another. Thus, we reasonably propose the Activity-Security-Trust (AST) driven model through synthetically considering the explicit and implicit driving forces (e.g. activity, security, and trust) underlying the decision process. AST-driven model facilitates to more accurately capture highly dynamical network behaviors and figure out the complex evolution process, allowing a profound understanding of the effects of security and trust in driving network evolution, and improving the biases induced by only involving activity representations in analyzing the dynamical processes.

relationships, learning, and sharing basic network quantities become rewards and incentives 24 . Theoretically, any network should at any time accelerate the interpersonal synergism through encouraging the creation of new connections and/or improving the existing links. Practically, this implies keeping up providing potential activity and/or initiating collaborations between little-previous-contact nodes. Normal evolution in networks would converge toward a scale-free structure, since some participants are natural hubs where some members themselves feel charming either by an attractive security level and/or by a favorable trust extent. The cautiousness resulted from security and trust considerations may pour cold water on the enthusiasm of eager connections if only caring about the activity. So we reasonably consider the activity, security, and trust of and between nodes comprehensively as three conjoint driving forces in network evolutions. Additionally, two drawbacks are potentially available in most the previous network models. At one hand, they devote to abstracting the physical elements into the virtual conceptual models by simplifying and/or ignoring the ubiquitous constraints underlying the actual systems, by which fruitful results are harvested but difficultly applicable to the practice. At the other hand, they impose too many assumptions to easily understand and popularize the network models, which fail to accommodate a suitable mapping set between the physical elements and the virtual concepts. Therefore, the network modeling should properly balance between generalization and applicability.
The essence of network evolution is to determine when, where, and how to create new links and update the existing links. To overcome the aforementioned drawbacks, we encode the security and trust of nodes rather than only depending on activity, and establish a one-to-one mapping set between the characteristics of real networks and the parameters of network models. To this end, we propose the Activity-Security-Trust (AST) driven model, in which the set of active nodes reflects those that probably join in the network, the activity rate quantifies the possibilities of nodes initiating the connection, the security level indicates the probability of nodes receiving the connections, and the trust extent emphasizes the opportunity of two nodes building the connections. The AST-driven model facilitates to objectively and accurately characterize the evolution process of the target network.

Results
We focus on analyzing three large-scale and time-resolved network datasets. The first dataset is composed of border routers and the undirected connections indicate at least one packet has been exchanged between the corresponding endpoint routers. The second dataset represents the undirected links connecting two users of Wikipedia if one votes for or against another in admin elections. The third network is obtained by drawing an undirected edge between any two employees that send e-mails to each other in a mid-sized manufacturing company. These datasets represent different types of networks. We define two measurable quantities for each node, the activity potential and the security level, and also allocate to each ordered pair nodes a measurable quantity, the trust extent. We find that the system-level dynamics can be disclosed by the activity potential distribution function from which the appropriate interaction rate among nodes is possibly derived, by the security-level distribution function from which it is possible to deduce the ability of resisting malicious attacks, and by the computational trust extent from which the effect of mutual trust on network evolution could be reasoning. Considering the empirically measured activity potential distribution, the security-level distribution, and the computational trust extent, we propose a process model for the generation and evolution of time varying networks, named Activity-Security-Trust (AST) driven model. The AST model timely regulates the network structure and traces to the source of hubs due to the heterogeneous activity, the asymmetrical security, and the coupled trust of and among the network elements. To assess the validity of the AST model, we compare the topological characteristics of three real datasets and the AST model. The results show that the AST model is capable of objectively reflecting the evolution process of real networks.
The activity potential. Perra 23 presented the definition of activity potential and accordingly proposed the activity-driven network model. Similarly, we consider activity as an explicit driving force and follow the concept of activity potential in the AST model. The activity means the individual activity completing through various cooperation with others. Sufficient evidences for the role of activity in network modeling can be readily observed in the collaboration network of scientific authors 25 . We investigate three dataset networks in which the individual activity can be measured respectively, i.e., traffic flow exchanged among Autonomous Systems (AS) collected from University of Oregon Route Views Project -Online data and reports, voting actions for or against each other in admin elections of English Wikipedia, and e-mail delivery from one employee to another. For each dataset, we quantify the individual activity of each node and define the activity potential x i of node i as the number of interactions I i (Δ t) that agent i performs in a characterized time window Δ t, divided by the total number of interactions U(Δ t) of all agents during the same time window Δ t. x i is expressed by 23 : The activity potential x i is an inherent property representing whether or not nodes are willing to collaborate with others, like human being's introversion and extroversion. The value of x i cannot happen to change upon node i birth. The larger the value of x i is, the more actively the node connects to another. The probability distribution F(x) that a given element i has activity potential x i statistically captures the interaction dynamics, as expressed by: where γ is a factor, 1 < γ < 3, which is only dependent on the type of networks. F(x) may be formed arbitrarily or fitted by empirical data. We attach a lower cut-off ε on x in order to avoid possible divergence of F(x) at close to the origin, i.e. ε ≤ x ≤ 1. The term a i indicates the activ i ty rate of node i, and is defined as the probability per unit time to create new links or interactions with others. The value of a i (t) is time-dependent and affected by x i , and should gradually climb up to a stable point as the degree of node i increases. So the definition of a i (t) is expressed by: where η is a rescaling factor, η > 0, k i (t) is the degree of node i at time t, and ϕ restricts the allowable maximum value of a i (t).
The security level. The security is a specialized field consisting of the provisions and policies to prevent unauthorized access, misuse, tamper, and denial of a computer network and network-accessible resources as well as ensuring their availability through proper procedures 26 . In the AST model, the security level emphasizes the ability against malicious elements. Like activity potential x i , the security level y i is an intrinsic quantity of node i, and generally keeps frozen unless initiative to strengthen the security level by the node itself. Much literature 27-29 about the quantification of security level in various networks are available, by which we can specify a security-level quantity y i for each node, and formalize the security-level probability distribution function L(y) that deduces the ability of resisting malicious attacks, as expressed by: where N (μ, σ 2 ) is a normal distribution with expectation μ and variance σ 2 . The values of μ and σ are determined by the served network type. Network connections introduce the possibility of cascading failures due to an exogenous or endogenous attack 30 , which implies the more active node is more prone to suffer from being attacked by malicious nodes due to possess numerous contacts. We define threat z i represent the amount and intensity of the suffered attacks to node i. The value of threat z i should gradually worsen to a saturation point as the degree of node i increases, as expressed by: where 0 ≤ z i ≤ 1, n indicates the number of potential nodes that may become active in the successive evolution, and δ is a factor, 10 ≤ δ ≤ 20, which is decided by the target network type. We employ the robustness s i (t) to quantify the possibility that node i is not infected by malicious nodes at time t, as expressed by: The stronger the security level of the node is, the more alleviated the threat is, the more improved the robustness is, and the less likely to be infected by malicious nodes.
The trust extent. In social science, the trust is considered as an asymmetrical dependency relationship, and constitutes the cornerstone of network evolution, so we quantify the mutual trust extent between each other in the AST model. The extent to which one agent trusts another is a measure of belief in the honesty, fairness, or benevolence of another party. Trust is an elemental consideration in approving a connection construction. The trust extent emphasizes the opportunity of the two nodes building the connections. Essentially, the trust extent can be shaped by two means: through its own enough ability to win partners' trust, and/or through the frequent contact with others. The contact frequency can be quantified by the times of two nodes interacting during a time interval. Therefore, the trust extent b ij (t) of node i on node j at time t is defined by: where ρ is a factor weighting the contributions between the number of connections and the security level to the trust extent. The bigger the value of ρ is, the more significant effect the connections exert on the trust extent. λ restricts the allowable maximum number of the related connections to the trust extent. ω ij (t) is the total number of connections between node i and node j before time t, as expressed by: where g 0 represents the initial network, < i, j> is a edge connecting node i and node j, g t represents the instantaneous network at time t, and Δ t is the time span of generating the instantaneous network.
Activity-Security-Trust driven network model. We show the dynamic network generation process (see Fig. 1).
Step i. Initialize the number of potential nodes, the activity probability distribution F(x), and the security-level probability distribution L(y).
Step ii. According to F(x), assign the activity potential x i for each potential node i.
Step iii. According to L(y), assign the security level y i for each potential node i.
Step iv. Regard the initial network g 0 introduced from the actual network as the initial case of the eventual network G T .
Step vi. Generate the eventual network where T is the time span of generating the eventual network, namely network aggregation time. Next we provide the creation process of an instantaneous network g t + Δt (t = 0, Δ t, 2Δ t, 3Δ t, …, T − Δ t).
Step i. At each discrete time step Δ t, the network g t + Δt starts with n disconnected vertices.
Step ii. Calculate degree k i (t) for each potential node and weight ω ij (t) for each edge in the eventual network G t .   Table 2. The metadata of AS dataset.
Step iii. By k i (t), (5) and (6), calculate threat z i (t) and robustness s i (t) for each potential node i.
Step iv. By ω ij (t) and (7), calculate trust extent b ij (t) for each ordered pair of potential nodes.
Step v. By k i (t) and (3), calculate activity rate a i (t) for each potential node i.
Step vi. Determine the active node in the probability a i (t)Δ t, otherwise become the black-hole node in the probability 1-a i (t)Δ t, i.e. only passively wait for receiving connections from active nodes.
Step vii. Create m connections for each active node i in terms of the independent probability Q ij (t), and attach the corresponding edges to the instantaneous network g t + Δt . The independent selection infers that duplicate target nodes are possibly available in m connections.
Step viii. At the next time step Δ t, all the edges in the network g t + Δt are erased, by which it holds that all interactions have a constant duration Δ t.
where Q ij (t) is defined by: where R ij (t) is the trust-extent probability function, and expressed by: The trust-extent probability function R ij (t) indicates the proportion of node i's trust extent on node j to the total trust, i.e. the more node i trusts node j, the bigger the value of R ij (t) is. One note is that the activity rate a i (t) may exceed over 1 after certain time t, and thus node i becomes dominantly active and is always hit in each selection, such as the hotspot servers in the Internet and the convergence routers in AS, which constantly connect to others.
The AST model outputs various random networks that share the same control parameters, however, the resulted eventual networks look different. Such differences are so small as to be statistically ignored in a large-scale complex network. The essence of network evolution is the process of generating new edges, which can be simplified into two sub-processes. One is to select an active node from potential nodes as the starting node of an edge, and the other is to select a terminal node from the rest. Accordingly, the activity rate affects the selection of the active node, while the security level and trust extent govern the determination of the target node. The AST model imitates the real generation process, and the parameters originate from actual networks, so the AST model is capable of objectively and accurately characterizing practical time varying networks. Figure 2 provides the results of numerical simulations of the network against various aggregation time T, and Table 1 shows the corresponding statistical information. The cumulative degree probability distribution gradually becomes slowly as the aggregation time T increases, which implies that the network accelerates growing as the size is enlarged. The increased aggregation time positively affects the network centralization and network heterogeneity but negatively restricting the network density. At each time step, the network appears to a simple random graph with low average connectivity. The accumulation of connections during the long aggregation time T improves the activity rate of nodes, and worsens the security level reversely. Due to the heterogeneous activity and asymmetrical security of nodes, the hubs that possess a large activity rate and trust extent are born in the network.
The AST model supports simple analytical evaluation. We define the eventual network as the union of all the instantaneous networks generated during each previous time step Δ t. Then, we erase the duplicated edges and self-links. The instantaneous network is composed of a set of newly interconnected nodes that correspond to exactly being active at that time, plus those who received connections from active agents. Assuming . Here < a(t)> is the average activity rate per unit time.

Discussion
The AST model is concise and understandable but not easy to determine the proper parameters so as to accurately reflect the real network characteristics. Fortunately, the AST parameters can be empirically measured in real world networks. One feasible way is to learn the driving forces governing the network evolution and then to symbolize the corresponding quantitative representation from priori knowledge. Another possible avenue to parameterization is to initially separate the evolution of existing networks into several short time durations, and then to determine network characteristics and parameters through constantly fitting parameters against the actual networks. Moreover, the AST model can be extensively used to research the molecular networks, time-varying networks and spatiotemporal network, and also facilities to predict epidemics dissemination and to investigate the human dynamics of face-to-face interaction networks. In summary, accurately understanding the network-evolution essence requires considering not only the explicit factors (e.g. activity potential) but also the implicit factors (e.g. security level and trust extent). The explicit factors reflect nodes' subjective initiative to create connections with others, while the implicit factors emphasize nodes' objective prudent to resolve the candidate targets. More factors are permitted to be associated with each connection decision (e.g. concurrency and persistence) in order to melt the limitations underlying the simple random networks. But one note is that the network modeling should properly balance between generalization and applicability, which represents interesting challenges for future work in this area.

Datasets.
We compare the AST model with three datasets (Supplementary Information): traffic flow exchanged among ASs collected from University of Oregon Route Views Project -Online data and reports, voting for and against each other in admin elections of English Wikipedia, and E-mails of employees in a mid-sized manufacturing companies to each other. We mainly focus on the number of nodes and the corresponding degree distribution in the undirected and unweighted graph, so we employ the cumulative degree distribution as a measure of topological similarity, in which the number of the nodes with one degree is exactly equal to the total number of nodes. For a given dataset, only the potential nodes n and the factor η could happen to change in adjusting the aggregation time, but not the other parameters due to their being the inherent properties of networks. According to the empirically measured network-specific properties, we give the parameters of the three datasets. The parameters of the ASs dataset are m = 1, γ = 1.7, ϕ = 100, μ = 0.5, σ = .

Autonomous systems dataset (AS). This dataset 31 is composed of border routers and the undirected
connections indicate at least one packet has been exchanged between the corresponding endpoint routers. The dataset contains 733 daily instances spanning 785 days from November 8, 1997 to January 2, 2000. We focus on three periods between 1997 and 2000. Table 2 shows the metadata of three periods. Figure 3 shows the network visualization and the cumulative degree distribution of the AS dataset as well as the AST model against three different aggregated views.
Wikipedia elections dataset (Wiki). The Wiki dataset 32 represents the undirected links connecting two users of Wikipedia if one votes for or against another in admin elections. Edges can be positive ("for" vote) and negative ("against" vote), but we treat both as the same. We consider two periods from March 1, 2005 to September 30, 2005. Table 3 shows the metadata of two periods. Figure 4 shows the network visualization of Wiki dataset and AST model against two different aggregated views.
Manufacturing company E-mail dataset (E-mail). This dataset 33 considers each employee of a mid-sized manufacturing company as a node. An undirected link exists if two employees sent e-mail to each other. We focus on three periods covering nine full months span from January 1, 2010 to September 30, 2010. Table 4 shows the metadata of three periods. Figure 5 shows network visualization of the E-mail dataset and AST model against three different aggregated views.