Abstract
Networks portray a multitude of interactions through which people meet, ideas are spread and infectious diseases propagate within a society^{1,2,3,4,5}. Identifying the most efficient ‘spreaders’ in a network is an important step towards optimizing the use of available resources and ensuring the more efficient spread of information. Here we show that, in contrast to common belief, there are plausible circumstances where the best spreaders do not correspond to the most highly connected or the most central people^{6,7,8,9,10}. Instead, we find that the most efficient spreaders are those located within the core of the network as identified by the kshell decomposition analysis^{11,12,13}, and that when multiple spreaders are considered simultaneously the distance between them becomes the crucial parameter that determines the extent of the spreading. Furthermore, we show that infections persist in the highk shells of the network in the case where recovered individuals do not develop immunity. Our analysis should provide a route for an optimal design of efficient dissemination strategies.
Main
Spreading is a ubiquitous process, which describes many important activities in society^{2,3,4,5}. The knowledge of the spreading pathways through the network of social interactions is crucial for developing efficient methods to either hinder spreading in the case of diseases, or accelerate spreading in the case of information dissemination. Indeed, people are connected according to the way they interact with one another in society and the large heterogeneity of the resulting network greatly determines the efficiency and speed of spreading. In the case of networks with a broad degree distribution (number of links per node)^{6}, it is believed that the most connected people (hubs) are the key players, being responsible for the largest scale of the spreading process^{6,7,8}. Furthermore, in the context of social network theory, the importance of a node for spreading is often associated with the betweenness centrality, a measure of how many shortest paths cross through this node, which is believed to determine who has more ‘interpersonal influence’ on others^{9,10}.
Here we argue that the topology of the network organization plays an important role such that there are plausible circumstances under which the highly connected nodes or the highestbetweenness nodes have little effect on the range of a given spreading process. For example, if a hub exists at the end of a branch at the periphery of a network, it will have a minimal impact in the spreading process through the core of the network, whereas a less connected person who is strategically placed in the core of the network will have a significant effect that leads to dissemination through a large fraction of the population. To identify the core and the periphery of the network we use the kshell (also called kcore) decomposition of the network^{11,12,13,14}. Examining this quantity in a number of real networks enables us to identify the best individual spreaders in the network when the spreading originates in a single node. For the case of a spreading process originating in many nodes simultaneously, we show that we can further improve the efficiency by considering spreading origins located at a determined distance from one another.
We study realworld complex networks that represent archetypical examples of social structures. We investigate (1) the friendship network between 3.4 million members of the LiveJournal.com community^{15}, (2) the network of email contacts in the Computer Science Department of University College London (Zhou, S., private communication), (3) the contact network of inpatients (CNI) collected from hospitals in Sweden^{16} and (4) the network of actors who have costarred in movies labelled by imdb.com as adult^{17} (see Supplementary Section SI for details).
To study the spreading process we apply the susceptible–infectious–recovered (SIR) and susceptible–infectious–susceptible (SIS) models^{2,3,18} on the above networks (see Methods). These models have been used to describe disease spreading as well as information and rumour spreading in social processes where an actor constantly needs to be reminded^{19}. We denote the probability that an infectious node will infect a susceptible neighbour as β. In our study we use relatively small values for β, so that the infected percentage of the population remains small. In the case of large β values, where spreading can reach a large fraction of the population, the role of individual nodes is no longer important and spreading would cover almost all the network, independently of where it originated from.
The location of a node is defined using the kshell decomposition analysis^{11,12,13}. This process assigns an integer index or coreness, k_{S}, to each node, representing its location according to successive layers (k shells) in the network. The k_{S} index is a quite robust measure and the node ranking is not influenced significantly in the case of incomplete information. (For details see Supplementary Fig. S6 in Section SII. Small values of k_{S} define the periphery of the network and the innermost network core corresponds to large k_{S} (see Fig. 1a and Supplementary Section SII.) Figure 1b–d illustrates the fact that the size of the population infected in a spreading process (shown in this example in the CNI network) is not necessarily related to the degree of the node, k, where the spreading started. Spreading may be very different even when it starts from hubs of similar degrees as comparatively shown in Fig. 1b and c. Instead, the location of the spreading origin given by its k_{S} index predicts more accurately the size of the infected population. For instance, Fig. 1b and d show that nodes in the same k_{S} layer produce similar spreading areas even if they have different k (by definition, in a given layer there could be many nodes with k≥k_{S}).
The above example suggests that the position of the node relative to the organization of the network determines its spreading influence more than a local property of a node, such as the degree k. To quantify the influence of a given node i in an SIR spreading process we study the average size of the population M_{i} infected in an epidemic originating at node i with a given (k_{S},k). The infected population is averaged over all the origins with the same (k_{S},k) values:
where ϒ(k_{S},k) is the union of all N(k_{S},k) nodes with (k_{S},k) values.
The analysis of M(k_{S},k) in the studied social networks reveals three general results (see Fig. 2): (1) For a fixed degree, there is a wide spread of M(k_{S},k) values. In particular, there are many hubs located at the periphery of the network (large k, low k_{S}) that are poor spreaders. (2) For a fixed k_{S}, M(k_{S},k) is approximately independent of the degree of the nodes. This result is revealed in the vertically layered structure of M(k_{S},k), suggesting that infected nodes located in the same k shell produce similar epidemic outbreaks M(k_{S},k) independent of the value of k of the infection origin. (3) The most efficient spreaders are located in the inner core of the network (large k_{S} region), fairly independently of their degree. These results indicate that the kshell index of a node is a better predictor of spreading influence. When an outbreak starts in the core of the network (large k_{S}) there exist many pathways through which a virus can infect the rest of the network; this result is valid regardless of the node degree. The existence of these pathways implies that, during a typical epidemic outbreak from a random origin, nodes located in highk_{S} layers are more likely to be infected and they will be infected earlier than other nodes (see Supplementary Section SIII). The neighbourhood of these nodes makes them more efficient in sustaining an infection in the early stages, thus enabling the epidemic to reach a critical mass such that it can fully develop. Similar results on the efficiency of highk_{S} nodes are obtained from the analysis of M(k_{S},C_{B}) in Fig. 2, where C_{B} is the betweenness centrality of a node in the network^{9,10}: the value of C_{B} is not a good predictor for spreading efficiency.
To quantify the importance of k_{S} in spreading we calculate the ‘imprecision functions’ ε_{k}_{S} (p), ε_{k}(p) and ε_{C}_{B} (p). These functions estimate for each of the three indicators k_{S}, k and C_{B} how close to the optimal spreading is the average spreading of the p N (0<p<1) chosen origins in each case (see Methods and Supplementary Section SIV). The strategy to predict the spreading efficiency of a node based on k_{S} is consistently more accurate than a method based on k in the studied p range (Fig. 3a). The C_{B}based strategy gives poor results compared with the other two strategies.
Our finding is not specific to the social networks shown in Fig. 2. In Supplementary Section SV we analyse the spreading efficiency in other networks not social in origin, such as the Internet at the router level^{20}, with similar conclusions. The key insight of our finding is that in the studied networks a large number of hubs are located in the peripheral lowk_{S} layers (Fig. 3b shows the location of the 25 largest hubs in the CNI; see also Supplementary Section SV) and therefore contribute poorly to spreading. The existence of hubs in the periphery is a consequence of the rich topological structure of real networks. In contrast, in a fully random network obtained by randomly rewiring a real network preserving the degree of each node (such a random network corresponds to the configuration model^{21}; see Supplementary Section SVI) all the hubs are placed in the core of the network (see the red scatter plot in Fig. 3c) and they contribute equally well to spreading. In such a randomized structure the same information is contained in the k shell as in the degree classification because there is a onetoone relation between the two quantities, which is approximately linear, k_{S}∝k (Fig. 3c and Supplementary Fig. S13). Examples of real networks that are similar to a random structure are the network of product space of economic goods^{22} and the Internet at the AS level (analysed in Supplementary Section SV).
Our study highlights the importance of the relative location of a single spreading origin. Next, we address the question of the extent of an epidemic that starts at multiple origins simultaneously. Figure 3d shows the extent of SIR spreading in the CNI network when the outbreak simultaneously starts from the n nodes with the highest degree k or the highest k_{S} index. Even though the highk_{S} nodes are the best single spreaders, in the case of multiple spreading the nodes with highest degree are more efficient than those with highest k_{S}. This result is attributed to the overlap of the infected areas of the different spreaders: largek_{S} nodes tend to be clustered close to one another, whereas hubs can be more spread in the network and, in particular, they need not be connected with one another. Clearly, the steplike features in the plot of highestk_{S} nodes (red solid curve in Fig. 3d) suggest that the infected percentage remains constant as long as the infected nodes belong in the same k shell. Including just one node from a different k shell results in a significantly increased spreading. This result suggests that a better spreading strategy using n spreaders is to choose either the highestk or k_{S} nodes with the requirement that no two of the n spreaders are directly linked to each other. This scheme then provides the largest infected area of the network, as shown in Fig. 3d.
Many contagious infections, including most sexually transmitted infections^{23}, do not confer full immunity after infection as assumed in the SIR model, and therefore are suitably described by the SIS epidemic model, where an infectious node returns to the susceptible state with probability λ. In an SIS epidemic the number of infectious nodes eventually reaches a dynamicequilibrium ‘endemic’ state, where as many infectious individuals become susceptible as susceptible nodes become infectious^{18}. In contrast to SIR, in the initial state of our SIS simulations 20% of the network nodes are already infected. The spreading efficiency of a given node i in SIS spreading is the persistence, ρ_{i}(t), defined as the probability that node i is infected at time t (ref. 7). In an endemic SIS state, becomes independent of t (see Supplementary Section SVII). Previous studies have shown that the largest persistence is found in the network hubs, which are reinfected frequently owing to the large number of neighbours^{7,24,25}. However, we find that this result holds only in randomized network structures. In the real network topologies studied here, we find that viruses persist mainly in highk_{S} layers instead, almost irrespectively of the degree of the nodes in the core.
In the case of random networks, it is found that viruses propagate to the entire network above an epidemic threshold given by β>β_{c}^{rand}≡λ〈k〉/〈k^{2}〉 (refs 24, 26). In real networks, such as the CNI network, the threshold β_{c} is different from β_{c}^{rand}. Furthermore, in real networks, we find that viruses can survive locally even when β<β_{c}, but only within the highk_{S} layers of the network, whereas virus persistence in peripheral k_{S} layers is negligible (Fig. 4a–c). As the kshell structure depends on the network assortativity, the lower threshold is in agreement with the observation that high positive assortativity^{27} may decrease the epidemic threshold.
The importance of highk_{S} nodes in SIS spreading is confirmed when we analyse the asymptotic probability that nodes of given (k_{S},k) values will be infected. This probability is quantified by the persistence function
as a function of (k_{S},k) at different β values (Fig. 4a and b). Highk_{S} layers in networks might be closely related to the concept of a core group in sexually transmitted infection research^{23}. The core groups are defined as subgroups in the general population characterized by high partner turnover rate and extensive intergroup interaction^{23}.
Similar to the core group, the dense subnetwork formed by nodes in the innermost k shells helps the virus to consistently survive locally in the innercore area and infect other nodes adjacent to the area. These k shells preserve the existence of a virus, in contrast to, for example, isolated hubs at the periphery. Note that a virus cannot survive in the degreepreserving randomized version of the CNI network, owing to the absence of highk shells.
The importance of the innercore nodes in spreading is not influenced by the infection probability values, β. In both models, SIS and SIR, we find that the persistence ρ or the average infected fraction M, respectively, is systematically larger for nodes in inner k shells compared with nodes in outer k shells, over the entire β range that we studied (Fig. 4c,d). Thus, the kshell measure is a robust indicator for the spreading efficiency of a node.
Finding the most accurate ranking of individual nodes for spreading in a population can influence the success of dissemination strategies. When spreading starts from a single node the k_{S} value is enough for this ranking, whereas in the case of many simultaneous origins spreading is greatly enhanced when we additionally repel the spreaders with large degree or k_{S}. In the case of infections that do not confer immunity on recovered individuals, the core of the network in the largek_{S} layers forms a reservoir where infection can survive locally.
Methods
The kshell decomposition.
Nodes are assigned to k shells according to their remaining degree, which is obtained by successive pruning of nodes with degree smaller than the k_{S} value of the current layer. We start by removing all nodes with degree k=1. After removing all the nodes with k=1, some nodes may be left with one link, so we continue pruning the system iteratively until there is no node left with k=1 in the network. The removed nodes, along with the corresponding links, form a k shell with index k_{S}=1. In a similar fashion, we iteratively remove the next k shell, k_{S}=2, and continue removing higherk shells until all nodes are removed. As a result, each node is associated with one k_{S} index, and the network can be viewed as the union of all k shells. The resulting classification of a node can be very different than when the degree k is used.
The spreading models.
To study the spreading process we apply the SIR and SIS models. In the SIR model, all nodes are initially in the susceptible state (S) except for one node in the infectious state (I). At each time step, the I nodes infect their susceptible neighbours with probability β and then enter the recovered state (R), where they become immunized and cannot be infected again. The SIS model aims to describe spreading processes that do not confer immunity on recovered individuals: infected individuals still infect their neighbours with probability β but they return to the susceptible state with probability λ (here we use λ=0.8) and can be reinfected at subsequent time steps, and they remain infectious with probability 1−λ.
The imprecision function.
The betweenness centrality, C_{B}(i), of a node i is defined as follows: Consider two nodes s and t and the set σ_{s t} of all possible shortest paths between these two nodes. If the subset of this set that contains the paths that pass through the node i is denoted by σ_{s t}(i), then the betweenness centrality of this node is given by
where the sum runs over all nodes s and t in the network.
The imprecision function ε(p) quantifies the difference between the average spreading between the p N nodes (0<p<1) with highest k_{S}, k or C_{B} and the average spreading of the p N most efficient spreaders (N is the number of nodes in the network). Thus, it tests the merit of using k shell, k and C_{B} to identify the most efficient spreaders. For a given β value and a given fraction of the system p we first identify the set of the N p most efficient spreaders as measured by M_{i} (we designate this set by ϒ_{eff}). Similarly, we identify the N p individuals with the highest kshell index (ϒ_{k}_{S} ). We define the imprecision of kshell identification as ε_{k}_{S} (p)≡1−M_{k}_{S} /M_{eff}, where M_{k}_{S} and M_{eff} are the average infected percentages averaged over the ϒ_{k}_{S} and ϒ_{eff} groups of nodes respectively. ε_{k} and ε_{C}_{B} are defined similarly to ε_{k}_{S} .
References
Caldarelli, G. & Vespignani, A. (eds) Large Scale Structure and Dynamics of Complex Networks (World Scientific, 2007).
Anderson, R. M., May, R. M. & Anderson, B. Infectious Diseases of Humans: Dynamics and Control (Oxford Science Publications, 1992).
Diekmann, O. & Heesterbeek, J. A. P. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation (Wiley Series in Mathematical & Computational Biology, 2000).
Keeling, M. J. & Rohani, P. Modeling Infectious Diseases in Humans and Animals (Princeton Univ. Press, 2008).
Rogers, E. M. Diffusion of Innovation 4th edn (Free Press, 1995).
Albert, R., Jeong, H. & Barabási, AL. Error and attack tolerance of complex networks. Nature 406, 378–482 (2000).
PastorSatorras, R. & Vespignani, A. Epidemic spreading in scalefree networks. Phys. Rev. Lett. 86, 3200–3203 (2001).
Cohen, R., Erez, K., benAvraham, D. & Havlin, S. Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 86, 3682–3685 (2001).
Freeman, L. C. Centrality in social networks: Conceptual clarification. Social Networks 1, 215–239 (1979).
Friedkin, N. E. Theoretical foundations for centrality measures. Am. J. Sociology 96, 1478–1504 (1991).
Bollobás, B. Graph Theory and Combinatorics: Proceedings of the Cambridge Combinatorial Conference in Honor of P. Erdös Vol. 35 (Academic, 1984).
Seidman, S. B. Network structure and minimum degree. Social Networks 5, 269–287 (1983).
Carmi, S., Havlin, S, Kirkpatrick, S., Shavitt, Y. & Shir, E. A model of Internet topology using kshell decomposition. Proc. Natl Acad. Sci. USA 104, 11150–11154 (2007).
ÁngelesSerrano, M. & Boguñá, M. Clustering in complex networks. II. Percolation properties. Phys. Rev. E 74, 056116 (2006).
LiveJournal, http://www.livejournal.com.
Liljeros, F., Giesecke, J. & Holme, P. The contact network of inpatients in a regional healthcare system. A longitudinal case study. Math. Population Studies 14, 269–284 (2007).
The Internet Movie Database, http://www.imdb.com.
Hethcote, H. W. The mathematics of infectious diseases. SIAM Rev. 42, 599–653 (2000).
Castellano, C., Fortunato, S. & Loretto, V. Statistical Physics of Social Dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
Shavitt, Y. & Shir, E. DIMES: Let the internet measure itself. ACM SIGCOMM Comput. Commun. Rev. 35, 71–74 (2005).
Molloy, M. & Reed, B. A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6, 161–180 (1995).
Hidalgo, C. A., Klinger, B., Barabasi, AL. & Hausmann, R. The product space conditions the development of nations. Science 317, 482–487 (2007).
Hethcote, H. & Rogers, J. A. Gonorrhea Transmission Dynamics and Control (SpringerVerlag, 1984).
PastorSatorras, R. & Vespignani, A. Immunization of complex networks. Phys. Rev. E 65, 036104 (2002).
Dezsó, Z. & Barabási, AL. Halting viruses in scalefree networks. Phys. Rev. E 65, 055103 (2002).
Cohen, R., Erez, K., benAvraham, D. & Havlin, S. Resilience of the Internet to random breakdowns. Phys. Rev. Lett. 85, 4626–4630 (2000).
Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
Large Network visualization tool, http://xavier.informatics.indiana.edu/lanetvi/.
AlvarezHamelin, J. I., Dallásta, L., Barrat, A. & Vespignani, A. Large scale networks fingerprinting and visualization using the kcore decomposition. Adv. Neural Inform. Process. Systems 18, 41–51 (2006).
Acknowledgements
We thank NSFSES, NSFEF, ONR, DTRA, Epiwork and the Israel Science Foundation for support. F.L. is supported by Riksbankens Jubileumsfond. We thank L. Braunstein, J. Brujić, kc claffy, D. Krioukov and C. Song for discussions and S. Zhou for providing the email dataset. The use of the hospital dataset was approved by the Regional Ethical Review Board in Stockholm (Record 2004=5:8).
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the work presented in this paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Information (PDF 1490 kb)
Rights and permissions
About this article
Cite this article
Kitsak, M., Gallos, L., Havlin, S. et al. Identification of influential spreaders in complex networks. Nature Phys 6, 888–893 (2010). https://doi.org/10.1038/nphys1746
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nphys1746
This article is cited by

Perceptions of livestock value chain actors (VCAs) on the risk of acquiring zoonotic diseases from their livestock in the central dry zone of Myanmar
BMC Public Health (2023)

Node importance evaluation in multiplatform avionics architecture based on TOPSIS and PageRank
EURASIP Journal on Advances in Signal Processing (2023)

Influential nodes identification in complex networks: a comprehensive literature review
BeniSuef University Journal of Basic and Applied Sciences (2023)

An algorithm for discovering vital nodes in regional networks based on stable path analysis
Scientific Reports (2023)

Exploring the landscape of dismantling strategies based on the community structure of networks
Scientific Reports (2023)