Efficient collective influence maximization in cascading processes with first-order transitions

In many social and biological networks, the collective dynamics of the entire system can be shaped by a small set of influential units through a global cascading process, manifested by an abrupt first-order transition in dynamical behaviors. Despite its importance in applications, efficient identification of multiple influential spreaders in cascading processes still remains a challenging task for large-scale networks. Here we address this issue by exploring the collective influence in general threshold models of cascading process. Our analysis reveals that the importance of spreaders is fixed by the subcritical paths along which cascades propagate: the number of subcritical paths attached to each spreader determines its contribution to global cascades. The concept of subcritical path allows us to introduce a scalable algorithm for massively large-scale networks. Results in both synthetic random graphs and real networks show that the proposed method can achieve larger collective influence given the same number of seeds compared with other scalable heuristic approaches.


I. INTRODUCTION
In natural and social systems, spreading dynamics lie at the heart of a variety of complex phenomena including failure propagation in infrastructure [1], adoption of new behaviors [2], and diffusion of norms and innovations in social networks [3].In these spreading processes, a small number of influencers arise as a consequence of the structural diversity of the underlying contact networks [4].In different fields, it has been accepted that the initial activation of such "superspreaders", who usually hold prominent locations in networks, is capable of shaping the spreading dynamics of large populations [5][6][7][8][9].From a practical point of view, identification of superspreaders could not only boost product promotion in a viral marketing campaign constrained by a limited budget, but also instruct the strategic protection of structurally pivotal units to maintain robust infrastructures at a lower cost.Given its great practical values in a range of important applications, the problem of locating superspreaders has attracted immediate attention in various disciplines [10][11][12][13][14][15][16][17][18][19][20].
In the simple case of finding single influential spreader, centrality-based heuristic measures such as degree [21], Betweenness [22], PageRank [23] and K-core [9,10,24,25] are routinely adopted.Beyond this non-interacting problem of finding single spreaders, it becomes more complicated when trying to select a group of spreaders, due to the collective effects of multiple agents.In fact, searching for the optimal set of influencers in spreading dynamics is an NP-hard problem and remains to be a challenging conundrum in network science [4].To address the many-body problem, an analytical framework of collective influence (CI) has been recently established for optimal percolation in random graphs [12].Based on stability analysis of zero solution, CI theory applies to the dynamical processes that can be transformed to optimal percolation with a continuous phase transition.One of such spreading processes is Linear Threshold Model (LTM) with a fixed threshold m i = k i − 1, where k i denotes node i's degree.
In a large variety of contexts, LTM has been frequently employed to describe a spreading process during which an individual make decisions under "peer pressure" [26][27][28][29][30][31][32][33][34][35].That is, a node will adopt a piece of information only after a certain number of its neighbors have accepted it.The choice of threshold m i = k i − 1 in LTM guarantees a continuous phase transition thus CI method can be applied accordingly [12].Nevertheless, for other choices of threshold, there also exist a wide class of LTMs that exhibit a first-order, or discontinuous phase transition.Under this condition, the stability analysis in CI method is not applicable any more.Therefore, how to determine the collective influence of the optimal seeds for first-order transitions in LTM should be studied separately.Although several approaches from different views have been developed, e.g., greedy search [4,11] and message passing algorithm [13], the problem of developing an efficient linear algorithm for large-scale networks still needs to be explored.
Here, we examine the collective influence in general LTM, and develop a scalable algorithm to locate optimal spreaders.By analyzing the message passing equations of LTM, we find the form of interactions between spreaders and derive the analytical form of their contributions to information cascading.Each seed's contribution, defined as the collective influence in threshold model (CI-TM) with first-order transitions, is determined by the number of subcritical paths emanating from it.Since the subcritical paths are such routes along which cascades can propagate, CI-TM can be considered as a reliable measure of seeds' structural importance in LTM.In an attempt to find the optimal superspreaders for first-order transitions in big-data analysis, we endeavor to maximize the number of active population by adaptively selecting nodes with the largest CI-TM values.Extensive comparisons with other plausible competing heuristics on both synthetic and realistic large-scale networks reveal that the proposed mechanism-based algorithm can indeed identify a smaller set of spreaders that could initiate global scale cascade.

II. COLLECTIVE INFLUENCE IN THRESHOLD MODELS: CI-TM
We present a theoretical framework to analyze the collective influence of individuals in general LTM.For a network with N nodes and M links, the topology is represented by the adjacency matrix {A ij } N ×N , where A ij = 1 if i and j are connected, and A ij = 0 otherwise.The vector n = (n 1 , n 2 , • • • , n N ) records whether a node i is chosen as a seed (n i = 1) or not (n i = 0).The total fraction of seeds is therefore q = i n i /N .During the spreading, the state of each node falls into the category of either active or inactive.The spreading starts from a q fraction of active seeds and evolves following a threshold rule: a node i becomes active when m i neighbors get activated will a node i become active.This process terminates when there are no more newly activated nodes.We introduce ν i as node i's probability in active (ν i = 1) or inactive (ν i = 0) state at the final stage, and denote Q(q) as the size of the giant connected component of active population.
For a directed link i → j, we introduce ν i→j as the probability of i being in an active state assuming node j is disconnected from the network [36].If n i = 1, then ν i→j = 1.Otherwise, ν i→j = 1 only when there are at least m i active neighbors excluding j.Since there exist many possible choices of these m i neighbors, we define P mi ∂i\j as the set of all combinations of m i nodes selected from ∂i \ j, where ∂i \ j is the set of nearest neighbors of i excluding j.Clearly, if i has k i connections emanating from it, there are ki−1 mi combinations, so the set P mi ∂i\j contains ki−1 mi elements, denoted by are the m i nodes in the hth combination.Figure 1(a) illustrates all three combinations P 1 , P 2 and P 3 corresponding to ν i→j for node i with a threshold m i = 2. Should at least one combination is fully activated, we have ν i→j = 1.
Generally, for locally tree-like networks, we have the following message passing equation: The final state of i is given by The above equations Eq. (1-2) describe the general cases of LTM.For the special choice of threshold m i = k i − 1, there is only one combination in P mi ∂i\j , and the transition becomes continuous.In this case, Eq. (1) recovers the message passing equation of optimal percolation as treated in Ref. [12] (See Appendix A).For any other choice of m i (except for m i = 1, which is equivalent to the problem of optimal immunization [12]), the multiplicity of activated neighbors leads to a first-order transition and cascading as discussed here.
For all the 2M directed links i → j, Eq. ( 1) is a nonlinear function of In Eq. ( 3), is the nonlinear function of vector ν → for link i → j.
Given the initial configuration of seeds n, the final state of ν → is fully determined by the self-consistent Eq. ( 3).Unfortunately, it cannot be solved directly due to the exponentially growing number of combinations in P mi ∂i\j .Therefore, for a small number of seeds, we adopt the iterative method to estimate the solution.In this point of view, Eq. ( 3) can be treated as a discrete dynamical system with the initial condition ν 0 → = n → .To simplify the calculation, we approximate the nonlinear function G i→j by linearization.Define 1), we know that ∂Gi→j ∂ν k→ℓ = 0 for ℓ = i.While in the case of ℓ = i and k = j, we have Although Eq. ( 5) has a complex form, in fact it is only determined by a simple quantity a k→i,i→j = p∈∂i\(k,j) ν p→i , which is interpreted as the number of i's active neighbors excluding k and j when i is absent from the network.On one hand, if a k→i,i→j ≥ m i , at least one term of p∈P h ν p→i equals one, since we are selecting m i elements from a set containing at least m i elements of value 1.Under such condition, ∂Gi→j ∂ν k→i = 0. On the other hand, if a k→i,i→j ≤ m i − 2, all the terms p∈P h \k ν p→i are zeros because we are selecting m i − 1 elements from a set containing at most m i − 2 nonzero elements, which also leads to ∂Gi→j ∂ν k→i = 0.When a k→i,i→j = m i − 1, all the terms p∈P ℓ ν p→i are zeros, and only the exact combination of these m i − 1 nonzero elements would lead to p∈P h \k ν p→i = 1.Therefore, we have ∂Gi→j ∂ν k→i = 1 − n i .Based on the above reasoning, we define a quantity I k→ℓ,i→j for links k → ℓ and i → j as follows: The definition of I k→ℓ,i→j is reminiscent of the Hashimoto non-backtracking (NB) matrix B [12,[37][38][39][40][41][42].In the case of m i = k i − 1, our quantity I k→ℓ,i→j can be transformed to the corresponding element of NB matrix B k→ℓ,i→j in optimal percolation [12] (See Appendix A).In fact, I k→ℓ,i→j is closely related to the concept of subcritical nodes.Recall that a node i is subcritical if it has m i − 1 active neighbors [33][34][35].This implies that one more activation of its neighbors will cause i activated.From Eq. ( 6) we know that I k→ℓ,i→j = 1 only if the links k → ℓ and i → j are connected, non-backtracking, and additionally, node i is subcritical in the absence of node k and j.In Fig. 1(b), node i has an active neighbor k 1 and two inactive ones k 2 and k 3 .By definition, for a threshold m i = 2, we conclude I k1→i,i→j = 0 since i has no active neighbor excluding k 1 and j, while I k2→i,i→j = 1 because i has 1 (= m i − 1) active neighbor excluding k 2 and j.
For a small ν → , a standard linearization around origin 0 gives G i→j (ν → ) ≈ G i→j (0) + G ′ i→j (0)ν → .However this will cause degeneracy since Eq. ( 5) constantly gives In Appendix B, we prove that this linearization has an approximation accuracy of O(|ν → | 2 ) (| • | is the vector norm), same as the linear Taylor expansion.
Combining all direct links, Eq. ( 4) can be approximated by a linear equation where With the notion of I k→ℓ,i→j , we can write F t as: Now we update the state of ν t → following Eq.(7).For the convenience of calculation, we put the matrix F t in a higher-dimensional space [12]: where function δ iℓ is 1 if i = ℓ, and 0 otherwise.The indices k, ℓ, i, j runs from 1 to N .Starting from The physical meaning of Eq. ( 11) can be interpreted as follows.If node i is a seed, ) and at least one of its corresponding neighbors k is a seed (n k = 1).Supposing i is not a seed, the contribution of a neighboring seed k is conveyed by the directed link k → i → j that satisfies n k = 1, n i = 0 and I 0 kiij = 1, which is shown in the second panel of Fig. 1(c).
For t = 2, we have The last term in Eq. ( 12) is actually the contribution of node i's 2-step neighbors s to ν 2 i→j .The contribution of a seed s is conducted through a directed path s → k → i → j that satisfies n s = 1, n k = 0, n i = 0 and I 0 skki = 1, I 1 kiij = 1 (See Fig. 1(c)).Inspired by Eq. ( 12), we define the concept of subcritical paths.For a directed link i → j, the path , and any two consecutive links are non-backtracking.If i 1 = i, we set the path's length L = 0.The subcritical paths of length L = 0, 1 and 2 are displayed in Fig. 1(c).Notice that, the calculation of L-length subcritical paths is in fact implemented by the multiplication of Following this idea, we can generalize Eq. ( 12) to ν T i→j at a given time T .The exact formula for ν T i→j is quite lengthy, so we do not display it here.But we should keep in mind that the form of ν T i→j is nothing but n i plus the contribution of seeds connected to i through subcritical paths with length L ≤ T when n i = 0. Finally, we note that when m i = k i −1, the subcritical path becomes the minimal, or shortest, path in the problem of optimal percolation [12].

III. CI-TM ALGORITHM
To quantify the active population in LTM, we define ν → = ij ν i→j .Starting from ν → = 0 when no seed is selected, ν → increases as more seeds are activated.Therefore, we expect that the first-order transition would appear early if ν → is maximized for a given number of seeds.
Based on the form of each element in ν → , we learn that the contribution of a seed i to ν → is composed of all its collective contributions to every potential element, exerted through the subcritical paths attached to i. Therefore, we employ a seed's contribution to ν → to define its Collective Influence in Threshold Model (CI-TM) to find the best influencers.For the trivial case of subcritical paths with length L = 0, we define CI-TM where k i is the degree of node i.Thus, at the zero-order approximation we recover the high-degree heuristic.The first panel of Fig. 1 As shown in Fig. 1(d), the contribution of node i to ν → through subcritical paths of length L = 1 is 2. Therefore, we have CI-TM In Fig. 1(d), the additional 2-length subcritical paths also contribute to CI-TM 2 (i), leading to CI-TM 2 (i) = 7.Moreover, in Fig. 1(d), we can observe that for the tree structure, the activation of node j in the first-step update will not affect I 1 jkkℓ in the second-step update, which means I 1 jkkℓ = I 0 jkkℓ .More generally, I jkkℓ is not affected by the activation of k's any precedent nodes on the subcritical path.Therefore, we will leave out the superscript t in the definition of CI-TM for locally tree-like networks.We can generalize the above CI-TM calculation to any given L. In summary, the definition of node i's influence CI-TM in an area of length L is: CI-TM L (i) =number of subcritical paths starting from i with length 0 ≤ ℓ ≤ L.
Figure 1(e) illustrates the calculation of node i's CI-TM for L = 2, in which subcritical paths with length ℓ ≤ L are distinguished by colors.
It is important to note that in the case of m i = k i − 1, the subcritical path becomes shortest path.Therefore the calculation area in Fig. 1(e) recovers the ball B(i, L) of radius L centered at i.This is exactly the same ball in the definition of influence CI L (i) = (k i − 1) j∈∂B(i,L) (k j − 1) in Ref. [12].Thus the algorithm to find influencers for the LTM with first-order transitions is a simple generalization of CI algorithm [12] for second order transition where the ball of influence B(i, L) in CI [12] defined by the shortest paths is replaced in CI-TM by a ball of influence, in this case, defined by the subcritical paths.This elegant mapping from a second order to a first-order transition model implies that all phenomenology found in Ref. [12] can be translated to the present case.In particular, weak nodes, such as those with modest degree but surrounded by hierarchy of hubs located across the subcritical paths, can be strong influencers in the LTM as well as in optimal percolation [12].
For a given fraction q of seeds, our goal is to maximize ν → .As we have explained, the CI-TM value of a seed depends on the choice of other seeds.Therefore, it is hard to obtain the optimal configuration {n| i n i /N = q} without turning to extremely time-consuming algorithms.To compromise and obtain a linear algorithm, we propose an adaptive CI-TM algorithm following a greedy approach.Define C(i, L) as the set of node i plus subcritical vertices belonging to all subcritical paths originating from i with length ℓ ≤ L. Beginning with an empty seed set S, we remove the top CI-TM nodes as follow.The calculation proceeds following the CI-TM algorithm.Remove C(i, L), and decrease the degree and threshold of C(i, L)'s existing neighbors by 1

7:
Update CI-TML for nodes within L + 1 steps from C(i, L) 8: end for 9: Output S In the above algorithm, we remove C(i, L) once i is added to the seed set.The reason lies in that it is unnecessary to select nodes in C(i, L) as seeds in later calculation, because the activation of i will definitely active C(i, L) (See Fig. 1(f)).Besides, C(i, L) can be identified during the computation of CI-TM L without additional cost.In traditional centrality-based methods, seeds may have significant overlap in their influenced population.It has been reported that the performance of these methods, such as K-core, suffers a lot from this phenomenon [10].On the contrary, in our algorithm, this problem is alleviated by the removal of subcritical nodes in C(i, L), which successfully reduces the overlap and improves the efficacy of each seed.Although such greedy strategy is not guaranteed to give the exact optimal spreaders, we expect a good performance in comparison with other heuristic methods in large-scale networks, as already found in Ref. [12].
More importantly, the CI-TM algorithm is linearly scalable for large networks with a computational complexity O(N log N ) as N → ∞.On one hand, computing CI-TM L is equivalent to iteratively visiting subcritical neighbors of each node layer by layer within L radius.Because of the finite search radius, computing CI-TM L for each node takes O(1) time.Initially, we have to calculate CI-TM L for all nodes.However, during later adaptive calculation, there is no need to update CI-TM L for all nodes.We only have to recalculate for nodes within L + 1 steps from the removed vertices, which scales as O(1) compared to the network size as N → ∞ as shown in Ref. [20].On the other hand, selecting the node with maximal CI-TM can be realized by making use of the data structure of heap that takes O(log N ) time [20].Therefore, the overall complexity of ranking N nodes is O(N log N ) even when we remove the top CI-TM nodes one by one.In addition, considering the relative small number of subcritical neighbors, the cost of searching for subcritical paths is far less than that when scanning all neighbors.This permits the efficient computation of CI-TM for considerably large L. In our later experiments on finite-size networks, we do not put a limit on L so as to calculate CI-TM thoroughly.But remember that we can always truncate L to speed up CI-TM algorithm for extremely large-scale networks.

IV. TEST OF CI-TM ALGORITHM
We first simulate LTM dynamics on synthetic random networks, including Erdös-Rényi (ER) and scalefree (SF) networks.In the models, we adopt a fractional threshold rule m i = ⌈tk i ⌉, which means that a node will be activated once t fraction of its neighbors are active (⌈•⌉ is the ceiling function).In order to verify the efficacy of CI-TM algorithm, we compare its performance against several widely-used ranking methods, including high degree (HD) [21], high degree adaptive (HDA), PageRank (PR) [23] and K-core adaptive (KsA) [24], and message passing algorithm (MP) based on statistical physics [13].Details about these strategies are explained in Appendix C. As a reference, we also display the result of random selection of seeds in the same figures.
Figure 2(a) presents Q(q) versus q on ER networks (t = 0.5, N = 2 × 10 5 , k = 6).Similar to bootstrap percolation on homogeneous networks, Q(q) first under-goes a continuous transition from Q(q) = 0 to nonzero, and then a first-order transition at a higher value of q c [32].Remarkably, CI-TM algorithm achieves the best performance for first-order transition by identifying the smallest number of spreaders (q CI-TM c = 0.105) to trigger global cascade.Besides, it also brings about the earliest continuous transition.Among all the strategies, random selection works worst, with a critical value q Random c = 0.220.Although the original K-core ranking has an unsatisfactory performance for multi-source spreading [10], the adaptive version KsA gives a better result similar to HDA.MP predicts a rather small set of spreaders as well, similar to CI-TM q MP c = 0.107, but it has the issue with convergence.Near the threshold, MP and other algorithms based on belief propagation (BP) do not converge to a stable solution.This is a general property of BP algorithm as applied to NP-hard problems.In order to drive the system to converge by brute force, a reinforcement process with a parameter r is applied, which produces a convergence iteration time of order O(r −1 ) [13].In implementation, the parameter r should be set small so that the true optimal result is not altered too much.As a result, the convergence time of MP increases accordingly.We find that, for a large-scale ER network with N = 10 7 , CI-TM algorithm (L = 3) can be completed in less than 2.4 hours, while MP (T = 40, r = 10 −4 ) is estimated to require 132 days by extrapolation (See Fig. A1(d) in Appendix C).However, even when MP has prohibitive running time for large datasets, the performance is also expected to degrade as the system size increases.This is because the convergence time is expected to increase for N → ∞ if performance needs to be the same.Alternatively, by increasing N and keeping the parameters T and r the same, a clear degrading in the performance is obtained where q MP c increases with N .This is clearly seen in the inset of Fig. A1(d) with an increase of q MP c as N → ∞, away from the more optimal value q CI-TM c as N → ∞.We also provide the first-order critical value q c for CI-TM and HDA on ER networks with different average degree k in the inset of Fig. 2(a).With the growth of k , q c increases and so does the difference between CI-TM and HDA.In some cases, q c can be further improved by a simple modification on CI-TM (See Appendix D).
We then examine CI-TM's performance on SF networks with power-law degree distributions P (k) ∼ k −γ in Fig. 2(b).We generate SF networks of size N = 2 × 10 5 and power-law exponent γ = 3 following the configuration model [43].It can be seen that the critical value of first-order transition becomes rather small for SF networks, due to the existence of highly connected hubs.Still, CI-TM algorithm outperforms other heuristic approaches and shows a comparable performance as MP.In addition, CI-TM algorithm produces the largest active component Q(q) for any give q before the first-order transition.For different values of power-law exponent γ, CI-TM consistently has a lower critical value q c than HDA, as shown in the inset of Fig. 2(b).In applications, we are frequently faced with largescale networks which exhibit more complicated topological characteristics than random graphs.Thus, it is more necessary and challenging to find a feasible strategy to efficiently approximate the optimal spreaders for those networks.Next, we explore CI-TM algorithm's performance for real networks.We examine two representative datasets: Wikipedia communication network (N = 622, 999, M = 2, 890, 729) [44] and Internet autonomous system network (N = 1, 464, 020, M = 10, 863, 640) [45].
Wikipedia is an open access internet encyclopedia, whose contents are provided and edited by numerous contributors.Wikipedia network, as a typical social contact network used in information spreading, is constructed by the communication instances among users in English Wikipedia.An edge from user i to j indicates that i has written at least a message on the talk page of j.The Internet network records the autonomous system level topological structure of Internet, which provides an example of infrastructure network on which malicious attack and failure propagation may occur.Both networks are treated as undirected in the analysis.
Figure 3(a) displays Q(q) for different number of seeds |S| for Wikipedia network.CI-TM is able to trigger the global cascade with the smallest group of seeds, whose size is quite small compared to the entire network due to the heavy-tailed degree distribution.We also discover that, in the setting of first-order transitions, some "weak nodes" as defined in Ref. [12] are proven to be more collectively influential than those highly-connected hubs.As shown in the inset figure, we present the percentage of "real influencers" (predicted by CI-TM) that HDA and HD have identified, with the vertical dash line indicating CI-TM's q c .At q c , HDA and HD locate up to 90% overlapping seeds with CI-TM algorithm, most of which are tagged as hubs.However, due to the collective nature of LTM, seeding the set of privileged nodes in the noninteracting view does not guarantee the maximization of collective influence.The other proportion of spreaders with lower degree, although may be inefficient as single spreaders, are responsible for bridging the collective influence of hubs.With the help of both hubs and bridging weak-nodes, CI-TM achieves the best performance against other heuristic measures.The Internet network also exhibits similar phenomenon in Fig. 3(b).In this case, HDA and HD can only find 80% optimal influencers at the first-order transition of CI-TM algorithm, missing a substantial amount of nodes with lower degree but indispensable in integrating the collective influence of highdegree seeds.Such kind of weak-node effect has also been discovered for second-order transition in optimal percolation [12].
Although the performance of K-core can be improved by adaptive calculation in Fig. 2, for large-scale real networks, we do not display the result of KsA due to its O(N 2 ) computational complexity and only show the curve of K-core.One cause for the unsatisfactory result of K-core is that it is not designed as a multiple spreaders finder since high K-core nodes tend to form densely connected clusters in the same shell, which prevents the expanding of information cascade.However, when finding individual spreaders, K-core is more optimal than other heuristics [9,10].
In Appendix C, we further compare CI-TM algorithm with other methods, including Betweenness Centrality (BC), Closeness Centrality (CC) and Greedy Algorithm (GA).Results from regular, ER and SF networks prove that CI-TM algorithm outperforms computationally expensive BC, CC and GA.Furthermore, with a much higher efficiency, CI-TM achieves a comparable performance as the skillfully designed MP algorithm which is designed based on constraint satisfaction optimization.We should note that, MP can only find the specific configuration at the transition point, whereas CI-TM algorithm is capable of optimizing the influence for any given fraction of seeds q.
V. DISCUSSION Identification of superspreaders in LTM has great practical implications in a wide range of dynamical processes.However, the complicated interactions among multiple spreaders prevent us from accurately locating the optimal influencers in LTM.Despite the existence of many heuristic strategies seeking to find optimal spreaders, they generally fail to take the collective effect of seeds into account, thus can only obtain suboptimal configurations.In this work, we propose a theoretical framework to analyze the collective influence of individuals in general LTM.By iteratively solving the linearized message passing equations, we decompose ν → into separate components, each of which corresponds to the contribution made by a single seed.Particularly, we find that the contribution of a seed is largely determined by its interplay with other nodes through subcritical paths.The subcritical path maximizing influence spreading in firstorder transition is a generalization of the minimal path that defines the ball of influence in CI optimal percolation in second-order phase transitions.Thus we see that the concept of collective influence is quite universal, encompassing continuous and discontinuous transitions, including generalization to network of networks in the brain [46] and socio-economics [47].In order to maximize the active population, we develop a scalable CI-TM algorithm.Our solution, rooted in the collective influence theory of general LTM, optimizes the active population by a fast adaptive scheme.The CI-TM algorithm, which is feasible for large-scale networks, outperforms other frequently used ranking strategies in synthetic random graphs and real-world networks.This provides a practical algorithm that can be employed in relevant applications such as viral marketing and information spreading in big-data analysis.

Appendix A: Relation to optimal percolation
In terms of the special choice of threshold m i = k i − 1 for each node with degree k i , the set P mi ∂i\j in Eq. ( 1) only has one combination.Therefore, Eq. ( 1) simplifies to It has been proved that LTM with threshold m i = k i − 1 can be mapped to optimal percolation [12].In the notation of optimal percolation, νi→j is defined as node i's probability NOT in active state when j is disconnected from the network, while ni = 0 if i is a seed and ni = 1 otherwise.The above definition of notation is exactly dual to what we define in Eq. (A1).Substituting ν i→j = 1 − νi→j and n i = 1 − ni in Eq. (A1), through a simple rearrangement we have which is exactly the message passing equation for optimal percolation in Ref. [12].Therefore, our formula designed for LTM also contains the situation of second-order transition as a limiting case.
In the stability analysis of optimal percolation, a linearization around zero solution ν→ = 0 is performed [12].Mapping to our notation, the zero solution in optimal percolation corresponds to ν → = 1.At this point, the condition a k→i,i→j = m i −1, or equivalently k i −2, is naturally fulfilled for any two consecutive non-backtracking links k → i and i → j, since all the k i − 2 neighbors p excluding k and j have ν p→i = 1.Therefore, I k→ℓ,i→j recovers to the corresponding element of NB matrix B k→ℓ,i→j as defined in Ref. [12].It should be noticed that the stability analysis of the NB matrix via its largest eigenvalue developed in Ref. [12] cannot be applied to the general LTM since LTM presents a firstorder phase transition where linear stability analysis is not valid.

Appendix B: Linearization of Gi→j
The conventional method to linearize the nonlinear function G i→j (ν → ) is Taylor expansion around the fixed point 0: However, for our specific function G i→j , the gradient G ′ i→j (0) is constantly 0 according to Eq. ( 5).To avoid the degeneracy, other linear approximation method should be applied.
For a differentiable function f : R n → R and x, y ∈ R n , the mean value theorem guarantees that there exists a real number c ∈ (0, 1) such that f (y Here ∇ is the gradient and • denotes the dot product.Set f = G i→j , x = 0, and y = ν → , we have Notice that, if we set c = 0, the above equation becomes the classical linear Taylor expansion: To deal with the degeneracy of G ′ i→j (0), we approximate G i→j (ν → ) by setting c = 1 for small ν → : In a finite-size network, for a small ν → with elements |ν i→j | ≤ 1, the gradient of each element ∂Gi→j ∂ν k→ℓ is bounded according Eq. ( 5).For all 2M elements, there exists a uniform upper bound for all the gradients ∇ ∂Gi→j ∂ν k→ℓ .Applying the mean value theorem to the differentiable function ∂Gi→j ∂ν k→ℓ , there should be a constant L such that This proves that the accuracy of the linear approximation is O(|ν → | 2 ), which is same as the linear Taylor expansion.

Appendix C: More comparisons with competing methods
A growing number of methods aimed to rank nodes' influence in networks have been proposed in previous studies.Here we introduce some of the most widely used competing methods and perform a thorough comparison with CI-TM algorithm.
High degree (HD) Degree, defined as the number of connections attached to a node, is the most widely-used measure of influence [21].In HD method, we rank nodes according to their degrees in a descending order, and sequentially select them as information sources.For HD method, the selected hubs intend to link with each other due to the assortative mixing property, making their influence areas overlap significantly.In this case, the selected seeds could rarely be optimal.High degree adaptive (HDA) is the adaptive version of HD method.After each removal, the degree of each node is recalculated.Such adaptive procedure can usually mitigate the overlapping and improve the performance of HD.
K-core (Ks) Through a k-shell decomposition process, K-core method assigns each node a k S value to distinguish whether it locates in the core region or peripheral area [24].In k-shell decomposition, nodes are iteratively removed from the network according to their current degrees.During the removal, all the nodes are classified into different k-shells.The K-core method selects nodes within high k-shells as the spreaders.In practice, single influential spreaders can be identified effectively by K-core ranking, which has been confirmed by both simulations and real-world data [9,10,25].However, K-core ranking has the disadvantage of severe overlap of seeds' influence areas, and therefore performs poorly for multiple node selection.This limitation can be alleviated with an adaptive scheme where we recompute the K-core after each removal.Since there exists many nodes in the same k-shell, we select the node with the largest degree to further distinguish nodes within the highest k-shell.Such K-core adaptive (KsA) method can effectively enhance the performance of K-core.
PageRank (PR) PageRank is a popular ranking algorithm of webpages which was developed and used by search engine Google [23].Over the years, PageRank has been adopted in many practical ranking problems.Generally speaking, PageRank measures a webpage's stationary visiting probability by a random walker following the hyperlinks in the network.As a special case of eigenvector centralities, PageRank evaluates the score of a node by taking into account its neighbors' scores.Even though such score-propagating mechanism works well for some purposes such as webpage ranking, an unfavorable consequence may be a heavy accumulation of scores near the high-degree nodes, specially for scale-free networks [38].
Greedy algorithm (GA) In GA, starting from an empty set of seeds, nodes with the maximal marginal gain are sequentially added to the seed set.Kempe et al. have proven that for a class of LTM with the attribute of submodularity, GA has a performance guarantee of 1−1/e ≈ 63%, which means it could achieve at least 63% of real maximal influence [4].Although guaranteed, this performance is still well sub-optimal.This result relies on the submodular property defined by a diminishing returns effect: the marginal gain from adding a node to the seed set S decreases with the size of S. It has been proven that several classes of LTM have submodular property, such as a random choice of thresholds.However, for LTM with a fixed threshold, it is not generally submodular.As a consequence, GA is not guaranteed to provide a such approximation of the optimal spreaders for general LTM which is anyway quite far (63%) from the optimal q c .Furthermore, the greedy search of GA requires massive simulations, which makes GA unscalable and thus limits its application in large-scale social networks.
Betweenness centrality (BC) BC quantifies the importance of node i in terms of the number of shortest paths cross through it [22].Therefore, nodes with large BC usually occupy the pivotal positions in the shortest pathways connecting large numbers of nodes.In BC method, we select nodes with high BC scores as seeds.Although BC has been widely applied in social network analysis, its relatively high computational complexity makes BC prohibitive for large-scale networks.So far, the most efficient algorithm takes O(M N ) to calculate BC for a network with N nodes and M links [48], which is still not applicable to modern social networks with millions of users.
Closeness centrality (CC) Closeness centrality quantifies how close a node to other nodes in the network [49].Formally, CC is defined as the reciprocal of the average shortest distance of a node to others in a network.Thus, nodes with high CC values tend to locate at the center of network clusters or communities.In CC method, we pick the seeds according to nodes' CC scores.Same as BC, CC also requires the heavy task of calculating all possible shortest paths.Thus the high computational cost of CC makes it infeasible for largescale networks.
Message passing (MP) Recently, message passing (MP) algorithm has attracted much attention due to its successful application to network-related optimization problems, such as containing epidemic outbreaks [14] and optimizing spread dynamics [13].In particular, F. Altarelli et al. proposed a message passing algorithm aimed to identify initial conditions to maximize the final number of active nodes in threshold models.Precisely, the trajectory of nodes' states is parametrized by a . By mapping the optimization onto a constraint satisfaction problem, an energy-minimizing algorithm based on the cavity method of statistical physics is proposed.In the algorithm, a convolution process is employed to compute the Max-Sum updates.The technical details of the derivation and implementation of the MP algorithm can be found in Ref. [13].In most cases MP algorithm does not converge, then a reinforcement strategy is implemented [13].By imposing an external field slowly increasing over time with a growth rate r, the system is forced to converge to a higher energy, which increases with r.In addition, it requires O(r −1 ) iterations to reach convergence.In practice, in order to acquire a good approximation, r should be set small, usually of order O(10 −4 ) or less.This causes an increase of computational burden of MP algorithm.For a node of degree k and threshold m i , each update takes O(T k(k − 1)m 2 i ) operations [13].Pre-computing the convolution can further save a factor of k − 1. Considering the updates of all N nodes for O(r −1 ) iterations, the overall complexity of MP is O(T i k i m 2 i /r).Therefore, the time complexity of MP depends on both the degree distribution of networks and the choice of threshold.
In Fig. A1(a)-(c), we provide the thorough comparisons of different methods on regular, ER and SF networks (N = 10 4 ), including MP algorithm and computationally expensive methods GA, BC and CC.We set T = 40 and a reinforcement parameter r = 1 × 10 −4 in MP algorithm.For all the cases, our method outperforms other heuristic ranking strategies and has a comparable performance of MP algorithm.We display the comparison of run time for MP and CI-TM for ER networks with k = 6 and threshold t = 0.5 as a function of N in Fig. A1(d).CI-TM algorithm with L = 3 can complete the calculation in about 2.4 hours for N = 10 7 , while MP with T = 40 and r = 10 −4 will approximately require 132.7 days.Moreover, as N increases, the critical value q c of MP shows a growing trend for the same parameters T = 40 and r = 10 −4 , as shown in the inset of Fig. A1(d).This implies that in order to converge to a smaller seed set for large N , MP requires a larger T and a smaller reinforcement parameter r, which in turn increases its computational complexity to prohibited limits.

Appendix D: Modified CI-TM algorithm
The CI-TM algorithm is essentially a greedy approach based on CI-TM values.The success of CI-TM algorithm depends on whether the currently selected seed has potentials to create more subcritical nodes that are helpful for the early formation of giant subcritical clus-ter.In LTM, there exists a special case of subcritical nodes with threshold m = 1, which is defined as vulnerable vertices in previous literature [30,31].Different from general subcritical nodes, vulnerable vertices are naturally subcritical since they have threshold m = 1 and do not rely on the states of others.During the calculation, a node of degree k becomes vulnerable when its m − 1 neighbors are removed, leaving k − m + 1 links in the network.For ER networks with a low average degree, the limited number of remaining links of vulnerable vertices could only form fragmented clusters.In this case, CI-TM would bias to nodes connected to large numbers of small vulnerable clusters, such as a peripheral hub linked to numerous leaf nodes.Unfortunately, the activation of such small clusters provides little help to the formation of giant subcritical cluster.Because the scattered vulnerable clusters have very few links connected to existing non-subcritical nodes, their activations are not effective in producing subsequent subcritical nodes.Moreover, once global cascade appears, these fragmented vulnerable clusters will be activated subsequently, without additional seeds.In this case, we heuristically propose a modified CI-TM (CI-TMm) algorithm by excluding vulnerable nodes in the calculation of CI-TM value.The performance of CI-TMm algorithm is displayed in Fig. A2 for ER networks with an average degree k = 4.The critical value q c for CI-TMm is advanced compared to CI-TM algorithm.However, before the first-order transition, CI-TMm cannot optimize the spreading and has substantially lower Q(q) than CI-TM.We should note that, CI-TMm presents a lower q c only in the situation of fragmented vulnerable clusters.For ER networks with higher average degrees (e.g., k = 6) where relatively large vulnerable clusters emerge, CI-TM still predicts earlier first-order transition.

FIG. 1 .
FIG. 1. Subcritical paths and collective influence of spreaders.(a), Three combinations of neighbors P1, P2 and P3 corresponding to νi→j in message passing equation.Node i has a threshold mi = 2.The full activation of at least one combination will lead to νi→j = 1.(b),For link i → j with an active neighbor k1 and inactive ones k2 and k3, I k 1 →i,i→j = 0 since i has 0 (< mi − 1) active neighbor excluding k1 and j, while I k 2 →i,i→j = 1 because i has 1 (= mi − 1) active neighbor k1 excluding k2 and j. (c), Illustrations of subcritical paths ending with link i → j for L = 0, 1, 2. Red dots stand for seeds, while squares represent m − 1 active neighbors attached to subcritical nodes.Subcritical paths are highlighted by thick links.(d), The contribution of seed i to ν→ exerted through subcritical paths of length L = 0, 1, 2. (e), Calculation method of CI-TML(i).Subcritical paths starting from i with length ℓ ≤ L are displayed by different colors.(f), An example of subcritical cluster.Assuming a uniform threshold m = 3, nodes inside the circle are subcritical since they all have 2 active neighbors, represented by blue nodes.Activation of the red node will trigger a cascade covering all subcritical nodes.

FIG. 2 .
FIG. 2. Performance of CI-TM algorithm on random networks.(a), Size of active giant component Q(q) versus the fraction of seeds q for ER networks with size N = 2 × 10 5 and mean degree k = 6.Different methods are distinguished by distinct markers and colors.Threshold is set as fractional t = 0.5.The CI-TM algorithm is run without limitation on L. MP is implemented by using T = 40 and a reinforcement parameter r = 1 × 10 −4 .For CI-TM, the identified critical value is q CI-TM c = 0.105(1) while for Random selection q Random c = 0.220(1).Inset presents the critical values qc identified by HDA and CI-TM for different mean degree k .(b), Comparison for scale-free networks with size N = 2 × 10 5 , power-law exponent γ = 3, minimal degree kmin = 2 and maximal degree kmax = 1000.The fractional threshold of LTM is also set as t = 0.5.Inset shows the critical values qc for different exponents γ.All the results are averaged over 50 realizations.

FIG. 3 .
FIG. 3. Performance of CI-TM algorithm on large-scale realworld networks.(a), The relationship between the size of active giant component Q(q) and the number of seeds |S| for Wikipedia communication network, calculated by different methods.The LTM fractional threshold is set as t = 0.65.Inset displays the percentage of "real influencers" predicted by CI-TM that HDA and HD have successfully identified.The vertical dash line indicates the critical point of CI-TM.(b), Same analysis for Internet network with fractional threshold t = 0.45.
FIG. A1.Comparison of competing methods.Performance of different methods for (a), regular network (N = 10 4 , k = 5, m = 3) (b), ER network (N = 10 4 , k = 6, t = 0.5) (c), and scale-free network (N = 10 4 , γ = 3, t = 0.5).In addition to the methods we have examined in the main text, we also display the results of Greedy Algorithm (GA), Betweenness Centrality (BC), Closeness Centrality (CC) and Message Passing (MP).We set the parameter T = 40 in MP and implement reinforcement with parameter r = 10 −4 .(d), Comparison of run time for MP (T = 40, r = 10 −4 ) and CI-TM (L = 3) on ER networks ( k = 6, t = 0.5).The dashed lines are power-law fitting, and the vertical line indicates network size N = 10 7 .The inset presents the critical value qc for MP and CI-TM with increasing network size N .

FIG. A2 .
FIG. A2.Performance of modified CI-TM on ER networks.For ER networks (N = 2 × 10 5 , k = 4, t = 0.5), the performance of CI-TMm surpasses CI-TM by excluding the fragmented vulnerable clusters in the adaptive calculation.Although Q(q) for CI-TMm is lower at first, it exceeds other methods near the critical point and achieves the earliest firstorder transition.