Understanding the influence of all nodes in a network

Centrality measures such as the degree, k-shell, or eigenvalue centrality can identify a network's most influential nodes, but are rarely usefully accurate in quantifying the spreading power of the vast majority of nodes which are not highly influential. The spreading power of all network nodes is better explained by considering, from a continuous-time epidemiological perspective, the distribution of the force of infection each node generates. The resulting metric, the expected force, accurately quantifies node spreading power under all primary epidemiological models across a wide range of archetypical human contact networks. When node power is low, influence is a function of neighbor degree. As power increases, a node's own degree becomes more important. The strength of this relationship is modulated by network structure, being more pronounced in narrow, dense networks typical of social networking and weakening in broader, looser association networks such as the Internet. The expected force can be computed independently for individual nodes, making it applicable for networks whose adjacency matrix is dynamic, not well specified, or overwhelmingly large.

Supplementary Notes 1-3 and Tables 1-3 present further information on topics mentioned in the main text. The remaining tables (4-7) and figures (1-6) present the data from the main text in more detail.
Supplementary Note 1: Correlation between ExF 2 and ExF 3 . The Expected Force is based on the distribution of the force of infection after an arbitrary number of infection events; a subscript can be used to indicate the number of events considered. Evidence that two events is sufficient is provided by the tight correlation between the metric when computed using two and again with three events (ExF 2 and ExF 3 ) for the simulated network classes considered here. The mean and standard deviations in the correlations, taken over 50 networks in each class, are as follows: Pareto 0.96 ± 0.007, Amazon 0.95 ± 0.013, Internet 0.97 ± 0.007, Facebook 0.99 ± 0.014, Astrophysics 0.99 ± 0.005. As expected from these tight correlations, increasing the number of events to three does not provide any meaningful increase in predictive accuracy.
Supplementary Note 2: Invariance of ExF M to choice of scaling parameter. The modified version of the expected force is defined in the main text as follows: where the degree of the node is scaled by α so as to prevent the logarithm from being zero for nodes with degree one. A simple shuffling of terms clarifies the influence of α: ExF M = log(α)ExF (i) + log(deg(i))ExF (i), implying that as α → 1, the scaling factor becomes irrelevant, and as α → ∞, it eclipses any contribution from the degree. The manuscript suggest α = 2 is a reasonable choice, providing the needed scaling without unduly skewing the measure.
We here show that the measure is largely invariant to the choice of α by testing the following values: 1.0001, 1.001, 1.01, 1.1, 1.5, 2, 3, 4, 8, and 16. For each α tested, the correlation between ExF M at α = 2 and the test value is measured. Measurements are made on all non-hub nodes for one hundred networks of each of the five simulated network families. The mean value for each parameter/network type is reported in Supplementary  Table 1. We test over the full network as this is likely to bias the testing values towards low degree nodes, where the choice of α is more likely to have an effect.
All correlations are greater than 0.999 for α in the range 1.5-3. Only two cases show correlation less than 0.99, both occurring when α = 16. If instead of reporting the mean correlation observed over the one hundred networks, we report the minimum, the same patterns hold, with the lowest value dropping to 0.976, again for α = 16.
Supplementary Note 3: Agreement between the expected force, k-shell, and eigenvalue centrality on the most important nodes. We here assess the agreement between the ExF, k-shell, and eigenvalue centrality as to which nodes are the most important in the network. All three measures are compared on one hundred networks for each of the five families. Supplementary Table 2 shows the mean rank correlation between ExF and the other measures, as well as the agreement between ExF and eigenvalue centrality regarding the top ten nodes.
Overlap with the k-shell is problematic in that the k-shell does not provide deep resolution. In the looser networks, the highest k-shell contains a large percentage of the total nodes in the network (mean 41% in Pareto, 92% in Amazon). Even in the denser networks, the top k-shell contains more than 10% of the network nodes (Internet 15%, Facebook 13%, Astrophysics 14%). Hence the observation that the top 10 nodes (1% of the network) as ranked by the ExF are also found in the highest k-shell is not sufficiently meaningful to report in the  Table 6. Correlation between spreading power metrics and tthc in real world networks. Shown is the estimated correlation from 1,000 nodes on the given network, along with the 95% confidence bounds of the estimate. This information is duplicated in Figure 3 in the main text. The ExF M is not included here as the modification only makes sense for processes with recovery; an empty column is used to allow easier visual comparison with the remaining tables. Accessibility is not measured for networks with more than 25,000 nodes.  Table 7. Correlation between spreading power metrics and epidemic potential in discrete time SIS processes on real world networks. Shown is the estimated correlation from 1,000 nodes on the given network, along with the 95% confidence bounds of the estimate. This information is duplicated in Figure 3 in the main text. Accessibility is not measured for networks with more than 25,000 nodes.  Table 8. Correlation between spreading power metrics and epidemic potential in discrete time SIR processes on real world networks. Shown is the estimated correlation from 1,000 nodes on the given network, along with the 95% confidence bounds of the estimate. This information is duplicated in Figure 3 in the main text. Accessibility is not measured for networks with more than 25,000 nodes.  Figure 3, Main text, showing the observed correlation and 95% confidence interval between each measure and spreading process outcome on the real networks. The expected force and ExF M (orange shades) show strong performance, consistently outperforming the other metrics (k-shell, eigenvalue centrality, and accessibility when computed, blue-green shades). The epidemic outcome for SI processes is the time until half the network is infected. For SIS and SIR processes it is the probability that an epidemic is observed. The suffix "-D" indicates spreading processes simulated in discrete time.
Supplementary Figure 2. Correlation of spreading power metrics to epidemic outcomes on real networks, detailed view. Larger versions of the point and error bar plots from Figure 3, Main text, showing the observed correlation and 95% confidence interval between each measure and spreading process outcome on the real networks. The expected force and ExF M (orange shades) show strong performance, consistently outperforming the other metrics (k-shell, eigenvalue centrality, and accessibility when computed, blue-green shades). The epidemic outcome for SI processes is the time until half the network is infected. For SIS and SIR processes it is the probability that an epidemic is observed. The suffix "-D" indicates spreading processes simulated in discrete time.
Supplementary Figure 3. Correlation of spreading power metrics to epidemic outcomes on real networks, detailed view. Larger versions of the point and error bar plots from Figure 3, Main text, showing the observed correlation and 95% confidence interval between each measure and spreading process outcome on the real networks. The expected force and ExF M (orange shades) show strong performance, consistently outperforming the other metrics (k-shell, eigenvalue centrality, and accessibility when computed, blue-green shades). The epidemic outcome for SI processes is the time until half the network is infected. For SIS and SIR processes it is the probability that an epidemic is observed. The suffix "-D" indicates spreading processes simulated in discrete time.