Impact of directionality and correlation on contagion

The threshold model has been widely adopted for modelling contagion processes on social networks, where individuals are assumed to be in one of two states: inactive or active. This paper studies the model on directed networks where nodal inand out-degrees may be correlated. To understand how directionality and correlation affect the breakdown of the system, a theoretical framework based on generating function technology is developed. First, the effects of degree and threshold heterogeneities are identified. It is found that both heterogeneities always decrease systematic robustness. Then, the impact of the correlation between nodal in- and out-degrees is investigated. It turns out that the positive correlation increases the systematic robustness in a wide range of the average in-degree, while the negative correlation has an opposite effect. Finally, a comparison between undirected and directed networks shows that the presence of directionality and correlation always make the system more vulnerable.

The threshold model has been widely adopted for modelling contagion processes on social networks, where individuals are assumed to be in one of two states: inactive or active. This paper studies the model on directed networks where nodal inand out-degrees may be correlated. To understand how directionality and correlation affect the breakdown of the system, a theoretical framework based on generating function technology is developed. First, the effects of degree and threshold heterogeneities are identified. It is found that both heterogeneities always decrease systematic robustness. Then, the impact of the correlation between nodal in-and out-degrees is investigated. It turns out that the positive correlation increases the systematic robustness in a wide range of the average in-degree, while the negative correlation has an opposite effect. Finally, a comparison between undirected and directed networks shows that the presence of directionality and correlation always make the system more vulnerable.
Contagion processes arise broadly in biological, social, and information systems. Examples include the spread of infectious diseases 1 , the diffusion of cultural fads 2 , the outbreak of political unrest 3 and the dissemination of rumor 4 . All these processes can be studied by contagion models, in which inactive (or susceptible) individuals are activated (or infected) by contacts with active neighbours. In general, the propagation of individual states is often characterized as either a simple contagion or a complex contagion 5 . A simple contagion is any process where the infection probability is assumed to be independent and identical across successive contacts, which is widely adopted in mathematical models of infectious diseases 6,7 . On the other hand, a complex contagion is a process where the infection probability is related to a certain critical number of exposures to infection an individual has, which usually exhibits cascade phenomena observed in social and economical systems 5,8 . Here, we are interested in complex contagion. One of the prototypes for studying such dynamics is the threshold model, which originated from the seminal work of Schelling 9 on residential segregation, and subsequently was developed by Granovetter 10 in the study on social influences. According to the general definition of the threshold model, an individual adopts a new product or idea only if a critical fraction 11 or number 12 of her friends have already been activated. This required fraction/number of adopters in the neighbourhood is defined as threshold.
The threshold model has been studied on undirected networks profoundly [11][12][13][14][15][16][17][18][19][20][21] . Although the contagion rule is simple, it turns out that the model can exhibit complex behaviour when individual difference and interaction structure are considered. Watts 11 first studied the model with one random initiator on complex networks to examine the effects of these factors on the cascade dynamics: it was shown that heterogeneous nodal degrees enhance systemic stability compared to that of homogeneous nodal degrees. Threshold heterogeneity, however, has a contrary effect. Gleeson and Cahalane 14 extended Watts' model to a finite number of initiators. They found that the varying seed size has a broad impact on the cascade transition as a function of the average degree z of nodes, even making the transition to be discontinuous for relatively small values of z. Singh et al. 18 also demonstrated the effect of seed selection on the cascade condition and final prevalence, for instance, selecting seeds by their degrees (highest first) results in the largest (as well as fastest) spread in Erdös-Rényi (ER) 22 networks.
However, most contagion processes are directed such as communication in email networks 23 , diffusion in financial networks 24 , information sharing in Twitter 25 and opinion following in Microblog 26 . In directed networks, a node is connected to others via incoming and outgoing links. Each node receives information via incoming links and sends it via outgoing ones. The presence of directionality opens the door to features that are essentially different from those in undirected networks. Dodds and collaborators 27,28 studied global spreading based on the propagation counts of edge-node pairs rather than just nodes. They constructed the gain ratio matrix for contagion in generalized random networks with both directed and undirected edges and degree-degree correlations, and obtained analytic expressions for the probability and expected size of global spreading events starting from a single seed or finite seeds. However, the calculation of the largest eigenvalue of the gain ratio matrix needs exact information of the combinations of in-and out-degrees of all the nodes. For complex directed networks, it is much difficult in obtaining the largest eigenvalue due to high dimension.
In this paper, we develop a theoretical framework based on generating function technology to calculate the condition and prevalence of global cascades. We study analytically and numerically the threshold model on directed Poisson and power-law networks. Similar to undirected networks 11,14 , a global cascade is not triggered in directed networks when the average in-degree z in of nodes is either too small or too large, however, large cascades are realized within an intermediate range of z in , which is referred to as the cascade window. In contrast to undirected networks, both degree and threshold heterogeneities make directed networks more vulnerable. Moreover, if the correlations between nodal in-and out-degrees are considered, the system shows distinct behaviours in most regimes of z in : the positive correlation makes the system robust to contagion, while the negative correlation makes the system prone to failure.

Results
In the threshold model, each node i can only exist in one of two discrete states: inactive or active. The rationality of i can be represented by a random threshold r i ∈ (0, 1), which is a random variable drawn from the distribution . Initially, one node is chosen randomly from the network to be active, and the others are inactive. In a directed network, a node can be influenced by its neighbours via incoming links (influenced neighbours) and influences others via outgoing links (influencing neighbours). At each time step, an inactive node i will be activated if the active number of its influenced neighbours m i satisfies Once the node is activated, it remains unchanged. If node i is an initial seed, it will first activate its influencing neighbours j whose thresholds satisfy Due to their unstable characteristic in the one-step sense, we call these influencing neighbours vulnerable nodes 11 . In any sufficiently large network with a small number of seeds, the only way in which the seed can grow is that at least one of its influencing neighbours is vulnerable. If the network is undirected, the necessary condition for a global cascade is the existence of a connected cluster of vulnerable nodes occupying a finite fraction of the network; that is, there must exist a giant component of vulnerable nodes (GCVN). Whereas for the the directed network, the giant in-component (GINC), the giant strongly connected component (GSCC), and the giant out-component (GOUC) of vulnerable nodes appear or disappear simultaneously, any of which can be used to determine whether global cascades commence. Based on generating functions for directed networks with and without correlations between in-and out-degrees, we obtain analytic expressions for the possibility and expected size of the large cascade, as manifested in the method section.
Let us start from the simplest case that all the nodes have identical threshold and nodal in-and out-degrees follow Poisson distributions without correlation. According to the model definition, whether a node to be active or not depends heavily on its in-degree. For the whole network, we shall focus on the dependence of the GSCC of vulnerable nodes on the average in-degree z in . Figure 1(a) shows the size S v of the GSCC of vulnerable nodes and the fraction ρ of active nodes as a function of z in in directed ER networks. Although ρ is larger than S v in a wide range of z in , they occur and fade out simultaneously; that is, the cascade transition can happen either in the loweror higher-connectivity regime. Nevertheless, the results of the transitions are distinct: in the lower-connectivity regime, the cascade propagation is limited by network sparsity. Any increase of z in will enhance the possibility of propagation, and finally causes the lower transition to occur which makes the system shift from a stable state to a vulnerable one; in the higher-connectivity regime, on the contrary, a node is surrounded by many inactive neighbors due to high network density, any increase of z in gives rise to its local stability, and finally leads to the higher transition which makes the system shift from a vulnerable state to a stable one. Thus, only within an intermediate range of z in can a global cascade be triggered given a proper value of the threshold. As demonstrated in Fig. 1(b), the cascade condition (Eq. (13)) is expressed as a boundary in the (r, z in ) plane (solid line). For comparison, simulation results of ρ (open squares) outline the window inside which large cascades occur, which are averaged over 100 realizations of the systems with the same parameter settings. Although the size of simulating networks is finite (N = 10000), analytical and actual boundaries agree well.
The impact of heterogeneity. Previous studies have identified the effects of degree and threshold heterogeneities 11,29 on systematic stability by varying the distributions of nodal degrees and thresholds, for instance, an undirected network with the heterogeneous degree distribution tends to be more robust to random attacks than an undirected homogeneous network. In the present paper, the degree heterogeneity is realized by the power-law distributions of the in-degree k in and out-degree k out , hence scale free (SF) 30 . Whereas for the threshold heterogeneity, we adopt the normal distribution with mean r and standard deviation σ. Figure 2(a) presents the cascade window in directed SF networks and compare it to directed ER networks. In both networks, nodal thresholds are identical. In contrast to the undirected situation, the directed SF network is more vulnerable than the directed ER network to random attacks. It results from the heavy dependence of the cascade condition on the average SciEntific REPORts | (2018) 8:4814 | DOI:10.1038/s41598-018-22508-1 in-degree z in . Different from the directed ER network which is sharply peaked around a well defined z in , the directed SF network is highly right-skewed; that is, the number of small in-degree nodes in the directed SF network is larger than that in the directed ER network, which yields more vulnerable nodes in the directed SF network according to Eq. (2), and therefore gives rise to cascading. Figure 2(b) shows the comparison of the cascade windows for identical (solid line) and normally distributed thresholds (dashed and dot lines). Meanwhile, the distributions of k in and k out are Poisson. As σ increases, the normal distribution becomes wide, and the fraction of nodes whose thresholds may be far from the mean. The nodes with thresholds below average will be easily activated while those with thresholds above average are difficult to be activated. When the seed fraction is very small, the nodes with thresholds below average plays an overwhelming role in contagion compared to those with thresholds above average 20 . Thus, the heterogeneity of nodal thresholds increases the likelihood of large cascades.
The impact of correlation. In directed networks, the correlation between in-and out-degrees is an important characteristic and has been the focus of many studies including robustness 31 , controllability 32 and synchronization 33 . In the present paper, the correlation between in-degree k i in and out-degree k i out of node i is assumed to take the form k k ( ) , where α is a tunable constant 34 . α > 0 corresponds to the positive correlation between k i out and k i in , i.e., a node of high in-degree has high out-degree as well; α < 0 refers to the negative correlation between k i out and k i in , i.e., a node of high in-degree has small out-degree instead. Intuitively, the negative correlation between k out and k in could  weaken the robustness of the system, since the possibility for a node of small k in being vulnerable is high, meanwhile the large k out makes it having many influencing neighbours. Hence, it facilitates cascade propagation. Whereas for the positive correlation, even though a node of small k in may be vulnerable, the assortative small k out limits the number of influencing neighbours. It therefore has difficulty in propagating any influence and the systematic robustness is enhanced. Figure 3 demonstrates the effect of α on the cascade windows in directed ER and SF networks over a wide range of both r and z in . Compared to the directed ER network, the directed SF network is largely affected by the correlation between in-and out-degrees. In particular, the larger the value of α is, the more robustness the system becomes, either for α > 0 or α < 0. The only exception is the interval z in ∈ (1.1, 1.5) where the positive correlation could decrease the robustness of the directed ER network. When z in is very small, the network is poorly connected and the cascade propagation is limited. Therefore, nodes of large degree are responsible for triggering large cascades. Compared to the uncorrelated ER network, the positive correlations between in-and out-degrees of these nodes increase the likelihood of propagation, hence the decrease of the robustness of the system.
Comparison with undirected networks. When comparing the robustness of directed networks with undirected networks, we consider two situations. One is that the average degree z d (=z in + z out ) of the directed network equals the average degree z u of the undirected network, i.e., the total number of links of the directed network is same to that of the undirected network. The other is the equivalence of z in and z u , i.e., the total number of links of the directed network is twice of that of the undirected network. Figure 4 shows the comparison of cascade windows in directed and undirected networks for z d = z u . The lowest boundaries of large cascades for both directed ER and SF networks are z d = 2 (consistent with z in = 1). So long as z d > 2, the size of the window in  directed networks is larger than that in undirected networks; that is, a directed network is more vulnerable than a undirected one with respect to network connectivity. Given a proper value of the threshold r, whether a node in the undirected network is vulnerable depends on its degree z u , whereas for the directed network the nodal vulnerableness is dependent on its in-degree z in . In the case of z d = z u , one has z in = z u /2. According to Eq. (2), the directed network has a larger number of vulnerable nodes than the undirected one, hence the less stability of the system. Figure 5 shows the comparison of the cascade windows in directed and undirected networks for z in = z u . Again, one notices similar behaviour regardless of the nodal in-and out-degree distributions and correlations. In the case of z in = z u , the possibility for a node being vulnerable in the directed network is the same as that in the undirected network. Meanwhile, the extra amount of outgoing links (z out = z u ) of the directed network enables it to influence more neighbours compared to the undirected network, hence the promotion of propagation.

Discussion
The investigation of structure and dynamics of social networks has attracted increasing attention from applied mathematicians, statistical physicists, and computer scientists over the past decades 35 . Of high interest is a broad range of contagion processes taking place over underline networks. In spite of its simplicity, the threshold model has attracted much attention with practical applications in viral marketing 36 , emotion transitivity 37 and risk perception 38 . However, very few studies have considered asymmetry of social interactions. In this paper, we extended the threshold model to directed ER and SF networks in which each node is connected to others via incoming and outgoing links with and without correlations.
Based on generating function technology, we have developed a theoretical framework for analyzing the threshold model on large directed networks. Through the calculation of the size of GSCC of vulnerable nodes, we obtained the condition and prevalence of large cascades in the directed network, which differ from those in the undirected network. For instance, both heterogeneities of nodal degrees and thresholds could decrease the systematic robustness. Moreover, the correlation between nodal in-and out-degrees has mixed effects on systemic stability: when directed networks are heterogeneous, the positive correlation increases the robustness, while the negative correlation decreases the robustness; when the directed networks are homogeneous, the above results hold when network connectivity is relatively high, nevertheless, the positive correlation decreases the systematic robustness when network connectivity is very low. Finally, by comparing the robustness of the threshold model on directed and undirected networks, it turns out that the presence of directionality always makes the system more vulnerable, regardless of the distributions of in-and out-degrees as well as correlations between them. These results complement previous studies 27, 28 .
We note, however, social dynamics is more complex 39 . To study contagion in realistic networks, one needs to generalize the present framework by incorporating more physical and structural properties. The comprehensive investigation of the frequency and size of large cascades through theoretical and empirical approaches is of significant interest.

Methods
Given a directed network, the joint probability distribution of a node of in-degree k in and out-degree k out is defined by p(k in , k out ). According to Eq. (2), a node of in-degree k in is vulnerable with probability ( ) respectively. To describe propagation from one node to another, one also requires generating functions for the joint excess degree of vulnerable nodes either approaching a random node or originated from the node, = ∑ is the average out-degree, hence z in = z out = z d /2. Based on g 01 (x, y) and g 10 (x, y), one has generating functions for the excess in-and out-degree distributions of vulnerable nodes, respectively. To analyze the properties of vulnerable clusters, we introduce analogous generating functions for size distributions of in-and out-components of vulnerable nodes,  (1, (1)) 1 00 1 , which is the average size of the GOUC of vulnerable nodes. Noting that ϕ 1 (1) = 1, one obtains Similarly, one has g (1) (1, 1) (1)  In analogy to undirected networks 11 , the above equation determines whether global cascades commence. To calculate the size of the GSCC of vulnerable nodes, we randomly choose a node of in-degree k in and out-degree SciEntific REPORts | (2018) 8:4814 | DOI:10.1038/s41598-018-22508-1 k out . The probability that there is at least one path from the GSCC of vulnerable nodes to the node via any incoming link is φ − 1 [ (1)] k 1 in . Meanwhile, the probability that there is at least one path from the node to the GSCC of vulnerable nodes via any outgoing link is ϕ − 1 [ (1)] k 1 out . Therefore, the size of the GSCC of vulnerable nodes is Condition for global cascades with correlation. In the case that the in-degree k in and out-degree k out of a node are correlated, we adopt the form k out = c(k in ) α 34 . According to the normalization one obtains