Ranking of Nodal Infection Probability in Susceptible-Infected-Susceptible Epidemic

The prevalence, which is the average fraction of infected nodes, has been studied to evaluate the robustness of a network subject to the spread of epidemics. We explore the vulnerability (infection probability) of each node in the metastable state with a given effective infection rate τ. Specifically, we investigate the ranking of the nodal vulnerability subject to a susceptible-infected-susceptible epidemic, motivated by the fact that the ranking can be crucial for a network operator to assess which nodes are more vulnerable. Via both theoretical and numerical approaches, we unveil that the ranking of nodal vulnerability tends to change more significantly as τ varies when τ is smaller or in Barabási-Albert than Erdős-Rényi random graphs.

The continuous outbreaks of epidemic diseases in a population and viruses in computer networks [1][2][3][4] motivate the study of epidemic spreading on a network. The Susceptible-Infected-Susceptible (SIS) epidemic process [5][6][7][8][9][10][11][12] has been widely studied as a model of virus spread on a network. In the SIS model, a node is either infected or susceptible at any time t. Each infected node infects each of its susceptible neighbors with an infection rate β. Each infected node recovers with a recovery rate δ. Both infection and recovery processes are independent Poisson processes and the ratio τ = β/δ is the effective infection rate. There is an epidemic threshold τ c and above the threshold τ > τ c a nonzero fraction of nodes is infected in the metastable state. The infection probability v k∞ (τ) of a node k in the metastable state at a given effective infection rate τ indicates the vulnerability of node k to the virus, and the prevalence, which equals the average fraction y ∞ (τ) of infected nodes reflects the global vulnerability of the network.
Researchers have mainly concentrated on the average fraction y ∞ of infected nodes in the metastable state to estimate the vulnerability of a network against a certain epidemic or virus. Great effort has been devoted to understand how the network topology influences the vulnerability and the epidemic threshold 6,[13][14][15] , when the effective infection rate is just above the epidemic threshold [ref. 16, p. 469]. In this case, it is found [ref. 16, p. 469] that, the metastable-state infection probability vector ( , obtained by the N-Intertwined Mean-Field Approximation (NIMFA) of SIS model is proportional to the principal eigenvector x 1 of the adjacency matrix A. In this work, we aim to explore the nodal infection probability in a systematic way, in different network topologies and when the effective infection rate τ varies. As a starting point, we investigate the ranking of nodal infection probabilities, which crucially informs a network operator which nodes are more vulnerable or require protection. Interestingly, we find that the ranking of the nodal infection probability changes as the effective infection rate τ varies. The observation points out that we cannot find a topological feature of a node i to represent the vulnerability of node i to an SIS epidemic, because the rankings in vulnerability of nodes in a network may be different when the effective infection rate τ varies, whereas a topological feature of node i remains the same. Our observation explains the finding of Hebert-Dufresne et al. 17 that different nodal features (such as degree, betweenness, etc.) should be used to select the nodes to immunize in different scenarios (based on different infection rates, link densities, etc.), i.e. different nodes should be immunized at different infection rates. In this paper, we explore two questions: (I) which network topology changes the ranking of nodal infection probabilities more significantly and (II) in which effective infection rate range, does the increment of the effective infection rate lead to a more significant change in the ranking for a given network topology?
We first assume that, for an arbitrary pair of nodes, the trajectory v k∞ (τ) and v m∞ (τ) as a function of the effective infection rate τ cross at most once in any interval (τ 0 , τ 1 ). We call this assumption the one-crossing assumption and Section "Discussion about the one-crossing assumption" of the supplementary information shows that the assumption is reasonably good. Then the rankings of the vulnerabilities v k∞ (τ) and v m∞ (τ) change or equivalently the trajectories v k∞ (τ) and v m∞ (τ) cross if (v k∞ (τ 0 ) − v m∞ (τ 0 ) (v k∞ (τ 1 ) − v m∞ (τ 1 ) < 0, when the effective infection rate τ changes from τ 0 to τ 1 . To estimate the maximal change in the ranking of nodal infection probabilities in a network, we consider the total number of crossings between the trajectories of the infection probabilities of all the nodes in a network, when the effective infection rate τ changes from just above the epidemic threshold to a large value, above which the ranking remains the same. The total number of crossings is a simple and straightforward measure of the changes in the ranking of nodal infection probabilities. (We also briefly discuss how the correlation of the ranking of nodal infection probabilities changes as the effective infection rate increases in Section "The Spearman rank correlation ρ as a function of α" of the supplementary information.) A higher total number of crossings may lead to a more complicated protection policy for a network operator. Given a network, we find a lower bound of the total number of crossings, which can be computed from the topology properties, in particular, from the degree vector and the principal eigenvector of the adjacency matrix. The lower bound is roughly proportional to, thus an accurate indicator of, the total number of crossings for an arbitrary network. Hence, these two topological features (i.e. the degree vector and the principal eigenvector of the adjacency matrix) could indeed characterize to what extent the ranking of nodal vulnerabilities would change on a network. Since the lower bound is computationally simple, it can be used to compare the total number of crossings for different network topologies. This result explains why the total number of crossings tends to be larger in networks with a smaller average degree if the degree distribution is given or with a larger degree variance if the average degree is given. Regarding to Question (II), we analytically derive the number of crossings when the effective infection rate τ 0 increases with a small value ε, given the infection probability vector V ∞ (τ 0 ) at the effective infection rate τ 0 . This theoretical result, validated by numerical results, explains the reason why the crossings are more likely to occur when the effective infection rate τ is smaller.

Results
We first introduce in detail how to count or quantify the changes of the nodal ranking of the infection probability. Afterwards, we investigate the changes in the ranking (I) in different topologies when the effective infection rate τ increases from just above the epidemic threshold to a large value, above which the ranking remains the same, and (II) when the effective infection rate varies in different ranges.
The counting of the nodal ranking changes. To explore the changes of the nodal ranking of the infection probability, we investigate in a given network how many crossings, denoted by χ, between the trajectory v k∞ (τ) and v m∞ (τ) for all pairs of nodes can occur in the effective infection rate interval (τ 0 , τ 1 ), where (1) are introduced in Section "Methods"). In Fig. 1, we illustrate the trajectories v k∞ (τ) of 10 nodes, randomly selected from a real-world network called Roget (N = 994 nodes, average degree E [D] = 7.32 and detailed in Section "Real-world graphs" of the supplementary information). For example, the vulnerability of the node corresponding to the red dash line in Fig. 1 changes dramatically from the medium vulnerability when τ = 0.12 to the high vulnerability when τ = 0.24. Network operators should be alert to such a change of nodal vulnerabilities. The trajectories v k∞ (τ) of other groups of nodes in Roget are shown and discussed in the first section of the supplementary information.
For a graph with N nodes, the maximum possible number of crossings is under the one-crossing assumption. To count the number of crossings in the interval (τ 0 , τ 1 ), we define an N × N matrix F with elements f ij : τ Figure 1. The meta-stable infection probabilit v k∞ as a function of the effective infection rate τ for 10 random nodes in a real-world network called Roget (details in Section "Real-world graph" of the supplementary information). The meta-stable infection probability v k∞ is obtained by solving (11).
Scientific RepoRts | 7: 9233 | DOI:10.1038/s41598-017-08611-9 Since f ii = 0, the matrix F has a zero diagonal just as the adjacency matrix A. A negative matrix element f ij < 0 means that there is a crossing between the trajectory v i∞ (τ) and v j∞ (τ) in the interval (τ 0 , τ 1 ). The number of crossings in the interval (τ 0 , τ 1 ) of the effective infection rate then equals where 1 {x} is the indicator function: 1 {x} = 1 if the event or condition {x} is true, else 1 {x} = 0. Specifically, if all nodal degrees are the same in a random graph, the nodal ranking in any interval of τ does not change, since the infection probability of every node 6 equals the average fraction of infected nodes for any effective infection rate τ. In this work, we focus on the NIMFA nodal infection probability in the meta-stable state which is obtained by solving (11), hence the initial conditions (such as how many nodes are initially infected) are not necessary. We can compute the SIS metastable viral infection probability v k∞ of any node k both by the N-Intertwined Mean-Field Approximation (NIMFA) 6,18 and by simulations 8 of the SIS continuous-time Markov process. We then further compare the number of crossings χ as a function of the increment in the effective infection rate τ over different ranges, obtained by NIMFA and the continuous-time simulations of the SIS model. As shown in Section "The comparison between NIFMA and the continuous-time simulation" of the supplementary information, the number of crossings obtained from NIMFA is relatively close to that from the simulations, so we compute the number χ of crossings mainly by NIMFA due to its computational efficiency. However, NIMFA may not be accurate when the effective infection rate is close to the epidemic threshold 8 . Hence, the number of crossings obtained by NIMFA and simulations may be different from each other when the effective infection rate is close to the epidemic threshold as shown in Section "The comparison between NIFMA and the continuous-time simulation" of the supplementary information.
The total number of crossings in different topologies. We explore the total number of crossings in different graph topologies ( , ) when the effective infection rate τ changes from just above the epidemic threshold, i.e. c (1) τ ε + , to a large value τ u , above which the ranking of the nodal infection probability hardly changes. In Section "Methods -The derivation of the lower bound χ l ", we prove that there exists a value of τ, above which the ranking of the nodal infection probabilities does not change. We derive a lower bound of the total number of crossings and show that the lower bound is actually an accurate indicator of the total number of crossings in different types of graphs.
As shown in Section "Methods", we derive a lower bound χ l of the total number of crossings in a given graph: where x 1 is the principal eigenvector of the adjacency matrix A, belonging to the largest eigenvalue λ 1 and d is the degree vector of the given graph.
With the one-crossing assumption, we can compute χ τ ε τ and V ∞ (τ u ). However, we have to select a proper value of τ u which is large enough and practical. We set the value of τ u as the minimum infection rate such that the average fraction of infected nodes y ∞ (τ u ) ≥ 0.9, since we find for most Erdös-Rényi (ER), Barabási-Albert (BA) random graphs and the aforementioned real-world network, that the rankings of the nodal degree and the infection probability are almost the same when the average fraction of infected nodes y ∞ ≥ 0.9. We discuss how we select the value of τ u in Section "The value of τ u " of the supplementary information. The scatter plot of the lower bound χ l vs versus ( , ) is shown in Fig. 2 for different graphs including ER random graphs, BA random graphs and six graphs constructed from real-world datasets (as described in Section "Real-world graphs" of the supplementary information), and the dash line in Fig. 2 is  Figure 2 and Equation (3) show that the lower bound χ l is indeed always smaller than and approximately proportional to χ τ ε τ + ( , ) c u (1) . Hence, the lower bound χ l is a computationally simple indication of the total number of changes in the ranking of the metastable state infection probability in a graph. Moreover, we find that for graphs generated by the same random graph model (ER or BA model), a graph with a small average degree tends to have a large number of crossings; given the average degree, a graph with a large degree variance tends to have more crossings. We can understand this observation as follows. The principal eigenvector component of any node i obeys the eigenvalue equation The principal eigenvector is positively correlated with the degree vector 19 . Such correlation weakens if the principal eigenvector has a large variance, leading to a large χ l . When the degree variance is large, the variance of the principal eigenvector tends to be large as well, contributing to a large χ l . As more links are added to a network, the network becomes more homogeneous and the variance of the principal eigenvector decreases, resulting in a smaller χ l , or equivalently less crossings. (1), we can compute the number χ(τ 0 , τ 1 ) of crossings in the given interval (τ 0 , τ 1 ) based on the knowledge of the infection probability vectors V ∞ (τ 0 ) and V ∞ (τ 1 ) only. Here, we show that we can theoretically derive the number of crossings in a small interval (τ 0 , τ 0 + Δτ) with the only knowledge of V ∞ (τ 0 ). Afterwards, we will validate this theory by numerical results, and illustrate in which ranges of the effective infection rate the number of crossings tends to be larger.

The number of crossings in different intervals of τ. As shown in
The crossings close to a given τ. For sufficiently small ε = Δτ > 0, the Taylor expansion of the steady-state NIMFA infection probability v k∞ for any node k is explicit up to order 2. In Section "Derivatives of v i∞ with respect to τ" of the supplementary information, we show the procedure to determine the m-th order derivative v i∞ (τ) with respect to the effective infection rate τ for any node k.
> 0 for sufficiently small ε > 0 and the ranking at τ + ε and at τ is unchanged. On the other hand, if v k∞ (τ + ε) − v m∞ (τ + ε) = 0, which implies, for sufficiently small ε > 0 (so that we can ignore the higher order terms in ε m for m > 1 in (4)), that In other words, given v k∞ (τ) of all nodes at τ, then there can be a zero or crossing at τ if ε km is small compared to τ. This approach is actually known as the Newton-Raphson method and corresponds with the first term in the Lagrange series for the inverse function (see ref. 20 in Page 304). A second order approximation, by ignoring terms of order O(ε 3 ) in (4), equating v k∞ (τ + ε) − v m∞ (τ + ε) = 0 and solving for ε, yields which is expected to be more accurate, in spite of the higher computational complexity since now also the set of second order derivatives needs to be solved. We rewrite (6) as c u (1) in ER random graphs (with the size N = 1000), BA random graphs (with the size N = 1000) and real-world networks (details in Section "Real-world graphs" of the supplementary information).
Scientific RepoRts | 7: 9233 | DOI:10.1038/s41598-017-08611-9 Using the generalized binomial expansion ( ) , valid for any |z| < 1, up to first order yields After only retaining the root with the minus sign, we arrive again at (5), illustrating that (5) is accurate when (5) is as small as possible (so that higher order evaluations are not needed). The discriminant must be positive in order to obtain feasible ε km . A positive discriminant is a condition for the existence of crossing in the interval (τ, τ + ε). Hence, given an effective infection rate τ 0 and the corresponding infection probability vector V ∞ (τ 0 ), there is a crossing close to τ 0 between the trajectory v k∞ (τ) and the trajectory v m∞ (τ) at τ + ε km if ε km computed by (5) is positive and small enough.
Numerical results. In the following, we propose to normalize the effective infection rate by the NIMFA epidemic threshold: 1 , so that we can compare the number χ of crossings in different intervals of α in the same range (1, α max ) for different network topologies, i.e. different average degrees and different degree distributions. We explore the crossings of the infection probability trajectories when the effective infection rate varies over the range (1, α max ). We divide the range (1, α max ) into intervals (α j−1 , α j ) where j = 1, 2, …, n is the index and α n = α max . We aim to explore in which interval of the normalized effective infection rate α the crossings are more likely to appear. Hence, instead of directly exploring the number of crossings between the trajectory of every node in the whole interval (1, α max ) of the effective infection rate α, we investigate the number χ(α j−1 , α j ) of crossings in (1) in each small interval (α j−1 , α j ). We denote α 0 = 1 (since the effective infection rate below the epidemic threshold corresponds to the all-healthy state), α n = α max and α j = α 0 + jΔα, where Δα = (α max − 1)/n is the length of each interval. We will study how the number of crossings changes at different regions of the effective infection rate τ or scaled α. The infection probability v k∞ (α) at any given value of the normalized effective infection rate α is computed by solving the NIMFA equation (11). On one hand, we can further compute the number χ(α j−1 , α j ) of crossings between all node pairs within any interval (α j−1 , α j ) by employing our theoretical result (5). On the other hand, we can also numerically compute the number χ(α j−1 , α j ) by (1). We first compare the theoretical (5) and numerical (1) when the normalized effective infection rate α is not close to 1, i.e. when the effective infection rate τ is not close to the epidemic threshold τ c ; specifically, we start from α 0 = 2 and α j = α 0 + jΔα, where Δα = 1. The main figures in Fig. 3 demonstrate that, for both ER and BA graphs, our theoretical result (5) agrees well with the numerical result (1) except for BA graphs in the interval (2, 3). The lower accuracy of our theoretical result for small α can be explained as follows. Compared to , a small value of ( ) is required for the accuracy of the theoretical results (5), since ε in (5) is required to be small with respect to the given effective infection rate τ. Hence, when α j is smaller, a smaller value of ( ) is needed for (5) to be accurate. We further plot the number χ(α j−1 , α j ) of crossings in the interval (α j−1 , α j ) as a function of α j , when the normalized effective infection rate α is close to 1 and the length of the interval is reduced to Δα = 0.1. When the length of the interval, i.e. Δα, is smaller, the theoretical (5) results are more consistent with the numerical (1) results for BA random graphs in the range of α ∈ (2, 3) in the inset than in the main figure of Fig. 3(b). For both ER and BA graphs, the two methods agree with each other well when the intervals of α are small, even when the normalized effective infection rate α is close to 1 as shown in the insets of Fig. 3. . The meta-stable infection probability v k∞ is obtained by solving (11) and the number χ(α j−1 , α j ) of crossings is obtained by (1). The results are averaged over 10 realizations.
Scientific RepoRts | 7: 9233 | DOI:10.1038/s41598-017-08611-9 Physical explanation. Figure 3 shows that more crossings appear when the effective infection rate is smaller. In this section, we give a physical explanation of that observation.
At an effective infection rate τ or a normalized effective infection rate α, Equation (14) shows that the comparison of the infection probabilities v k∞ (α) and v m∞ (α) is actually equivalent to the comparison of the sum of the infection probabilities of their neighbors, i.e.
Without loss of generality, we assume that the degree d k of node k is larger than the degree d m of node m, i.e. d k > d m . As discussed in Section "Methods", the infection probability v k∞ (α) > v m∞ (α) if the effective infection rate is large enough. If there exists a value of α 1 at which and thus more crossings could be expected when the effective infection rate τ exceeds α 1 . This hypothesis further motivates us to study the normalized standard deviation of the nodal infection probability: ) and explore whether a larger difference σ α σ α of σ* would imply more crossings in the interval (α j−1 , α j ).
The number χ(α j−1 , α j ) of crossings as a function of the difference σ*(α j−1 ) − σ*(α j ) is shown in Fig. 4(a) for ER random graphs and in Fig. 4(b) for BA random graphs. For both ER and BA random graphs, the number χ(α j−1 , α j ) of crossings are positively correlated with the difference σ*(α j−1 ) − σ*(α j ) in the interval (α j−1 , α j ). We observe the same in ER and BA random graphs with various average degrees though not shown here. The numerical results support that more crossings tend to appear in an interval where the variable σ* changes more.
We then further explore how the value of the variable σ*(α) changes with the normalized effective infection rate α. We plot the variable σ* as a function of the normalized effective infection rate α in Fig. 5(a) for ER random graphs and in Fig. 5(b) for BA random graphs with N = 1000 and various average degrees, and find that for both types of random graphs the curves can be fitted by a power law function, i.e. σ* is proportional to α −γ , especially when the average degree is not small. More figures and the curve fittings are shown in the last section of the supplementary information for both ER and BA random graphs. Figure 5 illustrates that the power law exponent γ of the fitting curves is close to 1 as the average degree E [D] increases for ER random graphs, and that is always approximately 1 for BA random graphs even though the average degree E [D] is small. Furthermore, the relationship between the variable σ* and the normalized effective infection rate α follows a power law when the effective infection rate is much larger, as shown in Section "σ* as a function of τ" of the supplementary information.
When α is large, we can theoretically prove the power-law relationship between the variable σ* and the normalized effective infection rate α. By (12) and assuming a large enough effective infection rate, we obtain  ). For BA graphs, we employ the minimum degree m = 2, thus the average degree E [D] = 4, and the size N = 1000 (the NIMFA epidemic threshold 0 0902 ). The meta-stable infection probability v k∞ is obtained by solving (11), the number χ(α j−1 , α j ) of crossings is obtained by (1) and the value of σ*(α) is obtained by (7). The results are averaged over 10 realizations.
Scientific RepoRts | 7: 9233 | DOI:10.1038/s41598-017-08611-9 In a finite graph, Var , we obtain that σ* is proportional to α −1 . Although the power-law relationship between σ* and α can be clearly observed in Fig. 5, the effective infection rate τ corresponding to the variable α in this figure may be smaller than 1 and the theoretical proof is only valid when the effective infection rate  1 τ . Our result (8) is based on connected graphs, because the terms E       Validation on a real-world network. Finally, we validate our previous findings on the real-world network -Roget, detailed in Section "Real-world graphs" of the supplementary information. As shown in Fig. 6(a), the  The real-world network -Roget, detailed in Section "Real-world graphs" of the supplementary information, is employed. The meta-stable infection probability v k∞ is obtained by solving (11), the number χ(α j−1 , α j ) of crossings is obtained by (1) and the value of σ*(α) is obtained by (7). The NIMFA epidemic threshold τ ≈ .
. number χ(α j−1 , α j ) of crossings at normalized effective infection rate α interval obtained by theoretical and numerical methods are consistent with each other. The number of crossings decreases fast as α increases, similar to ER and BA models. The main figure of Fig. 6(b) shows that the number χ(α j−1 , α j ) of crossings increases with the difference σ*(α j−1 ) − σ*(α j ) in the interval (α j−1 , α j ). In the inset of Fig. 6(b), we observe the power-law relationship between the variable σ* and the normalized effective infection rate α. All these findings are well in line with previous results on ER and BA random graphs.

Discussion
In the SIS model, the infection probability trajectory v k∞ (τ) of node k and the infection probability trajectory v m∞ (τ) of node m may cross if (v k∞ (τ 0 ) − v m∞ (τ 0 ))(v k∞ (τ 1 ) − v m∞ (τ 1 )) < 0, when the effective infection rate τ varies from τ 0 to τ 1 . The number χ(τ 0 , τ 1 ) of crossings of all node pairs within an interval (τ 0 , τ 1 ) of the effective infection rate measures the change in the ranking of the nodal vulnerabilities when the effective infection rate changes from τ 0 to τ 1 . We explore in what types of network topologies and in what ranges of the effective infection rates the crossings are more likely to appear. Theoretically, we find a lower bound χ l in (2) of the total number of crossings in a graph. The lower bound χ l only depends on topological features, i.e. the degree vector and principal eigenvector of the adjacency matrix. That lower bound χ l is also shown to reflect the total number of crossings for a given graph. Moreover, we analytically predict the crossings close to an effective infection rate τ 0 , given the infection probabilities of all nodes at the effective infection rate τ 0 . This theory can be used to estimate the changes of the ranking of the nodal vulnerabilities if the effective infection rate τ slightly increases from its current value τ 0 . We find that more crossings tend to appear when the effective infection rate is smaller. Our findings may help network operators to estimate how significant the ranking of nodal vulnerabilities may change for a given change of the effective infection rate on a given network. This work inspires interesting further questions. For example, how much is the change in the value of the nodal infection probabilities when the trajectories of the nodal infection probability crossing? Can we use the changes in the ranking of nodal infection probabilities to more effectively select the nodes to immunize?

Methods
Network construction. The Erdös-Rényi (ER) random graph 21 is one of the most widely-used and well-studied models. In an ER random graph G p (N) with N nodes, each pair of nodes is connected with probability p independent from every other pair. The distribution of the degree of a random node is binomial:  (N) is almost surely connected. We employ ER graphs with p = 2p c (the average degree is approximately E [D] = 14) and N = 1000 as an example in some discussions, but consider the ER graphs with various average degrees when needed.
Besides the ER random graph, the scale-free model is often used to capture the scale-free degree distribution of the real-world networks such as the Internet 22 and World Wide Web 23 . In this work, we consider the Barabási-Albert (BA) model 24 , which begins with an initial connected network of m 0 nodes. At each step, a new node is connected to m ≤ m 0 existing nodes. The probability that an existing node is chosen to be connected is proportional to the degree of the existing node. The degree distribution of BA random graphs 16 where v i (t) is the infection probability of node i at time t, and the adjacency matrix element a ij = 1 or 0 denotes if there is a link or not between node i and node j.
where A is the N × N adjacency matrix of the network, I is the N × N identity matrix and diag (v i (t)) is the diagonal matrix with elements ....  (11), if exists, points to the existence of a metastable state with a non-zero fraction of infected nodes. Or else, the metastable state can be figured as 0 or not existing. In this paper, we are interested in actually the metastable state. Furthermore, the NIMFA epidemic threshold c (1) 1 1 τ = λ , where λ 1 is the largest eigenvalue of the adjacency matrix A, is a lower bound of the exact epidemic threshold τ c , i.e. c c (1) τ τ < . The epidemic dies out if the effective infection rate τ τ < c (1) . Since the NIMFA is the main approach in this work, we also employ the NIMFA epidemic threshold τ c (1) . The Laurent series of the steady-state infection probability is given by refs 16