Identification of influential spreaders in complex networks

Kitsak, Maksim; Gallos, Lazaros K.; Havlin, Shlomo; Liljeros, Fredrik; Muchnik, Lev; Stanley, H. Eugene; Makse, Hernán A.

doi:10.1038/nphys1746

Download PDF

Letter
Published: 29 August 2010

Identification of influential spreaders in complex networks

Maksim Kitsak^1,2,
Lazaros K. Gallos³,
Shlomo Havlin⁴,
Fredrik Liljeros⁵,
Lev Muchnik⁶,
H. Eugene Stanley¹ &
…
Hernán A. Makse³

Nature Physics volume 6, pages 888–893 (2010)Cite this article

29k Accesses
1980 Citations
17 Altmetric
Metrics details

Abstract

Networks portray a multitude of interactions through which people meet, ideas are spread and infectious diseases propagate within a society^1,2,3,4,5. Identifying the most efficient ‘spreaders’ in a network is an important step towards optimizing the use of available resources and ensuring the more efficient spread of information. Here we show that, in contrast to common belief, there are plausible circumstances where the best spreaders do not correspond to the most highly connected or the most central people^6,7,8,9,10. Instead, we find that the most efficient spreaders are those located within the core of the network as identified by the k-shell decomposition analysis^11,12,13, and that when multiple spreaders are considered simultaneously the distance between them becomes the crucial parameter that determines the extent of the spreading. Furthermore, we show that infections persist in the high-k shells of the network in the case where recovered individuals do not develop immunity. Our analysis should provide a route for an optimal design of efficient dissemination strategies.

Identifying influential spreaders in complex networks for disease spread and control

Article Open access 01 April 2022

Identifying influential spreaders by gravity model

Article Open access 10 June 2019

Systematic comparison between methods for the detection of influential spreaders in complex networks

Article Open access 22 October 2019

Main

Spreading is a ubiquitous process, which describes many important activities in society^2,3,4,5. The knowledge of the spreading pathways through the network of social interactions is crucial for developing efficient methods to either hinder spreading in the case of diseases, or accelerate spreading in the case of information dissemination. Indeed, people are connected according to the way they interact with one another in society and the large heterogeneity of the resulting network greatly determines the efficiency and speed of spreading. In the case of networks with a broad degree distribution (number of links per node)⁶, it is believed that the most connected people (hubs) are the key players, being responsible for the largest scale of the spreading process^6,7,8. Furthermore, in the context of social network theory, the importance of a node for spreading is often associated with the betweenness centrality, a measure of how many shortest paths cross through this node, which is believed to determine who has more ‘interpersonal influence’ on others^9,10.

Here we argue that the topology of the network organization plays an important role such that there are plausible circumstances under which the highly connected nodes or the highest-betweenness nodes have little effect on the range of a given spreading process. For example, if a hub exists at the end of a branch at the periphery of a network, it will have a minimal impact in the spreading process through the core of the network, whereas a less connected person who is strategically placed in the core of the network will have a significant effect that leads to dissemination through a large fraction of the population. To identify the core and the periphery of the network we use the k-shell (also called k-core) decomposition of the network^11,12,13,14. Examining this quantity in a number of real networks enables us to identify the best individual spreaders in the network when the spreading originates in a single node. For the case of a spreading process originating in many nodes simultaneously, we show that we can further improve the efficiency by considering spreading origins located at a determined distance from one another.

We study real-world complex networks that represent archetypical examples of social structures. We investigate (1) the friendship network between 3.4 million members of the LiveJournal.com community¹⁵, (2) the network of email contacts in the Computer Science Department of University College London (Zhou, S., private communication), (3) the contact network of inpatients (CNI) collected from hospitals in Sweden¹⁶ and (4) the network of actors who have costarred in movies labelled by imdb.com as adult¹⁷ (see Supplementary Section SI for details).

To study the spreading process we apply the susceptible–infectious–recovered (SIR) and susceptible–infectious–susceptible (SIS) models^2,3,18 on the above networks (see Methods). These models have been used to describe disease spreading as well as information and rumour spreading in social processes where an actor constantly needs to be reminded¹⁹. We denote the probability that an infectious node will infect a susceptible neighbour as β. In our study we use relatively small values for β, so that the infected percentage of the population remains small. In the case of large β values, where spreading can reach a large fraction of the population, the role of individual nodes is no longer important and spreading would cover almost all the network, independently of where it originated from.

The location of a node is defined using the k-shell decomposition analysis^11,12,13. This process assigns an integer index or coreness, k_S, to each node, representing its location according to successive layers (k shells) in the network. The k_S index is a quite robust measure and the node ranking is not influenced significantly in the case of incomplete information. (For details see Supplementary Fig. S6 in Section SII. Small values of k_S define the periphery of the network and the innermost network core corresponds to large k_S (see Fig. 1a and Supplementary Section SII.) Figure 1b–d illustrates the fact that the size of the population infected in a spreading process (shown in this example in the CNI network) is not necessarily related to the degree of the node, k, where the spreading started. Spreading may be very different even when it starts from hubs of similar degrees as comparatively shown in Fig. 1b and c. Instead, the location of the spreading origin given by its k_S index predicts more accurately the size of the infected population. For instance, Fig. 1b and d show that nodes in the same k_S layer produce similar spreading areas even if they have different k (by definition, in a given layer there could be many nodes with k≥k_S).

**Figure 1: When the hubs may not be good spreaders.**

The above example suggests that the position of the node relative to the organization of the network determines its spreading influence more than a local property of a node, such as the degree k. To quantify the influence of a given node i in an SIR spreading process we study the average size of the population M_i infected in an epidemic originating at node i with a given (k_S,k). The infected population is averaged over all the origins with the same (k_S,k) values:

where ϒ(k_S,k) is the union of all N(k_S,k) nodes with (k_S,k) values.

The analysis of M(k_S,k) in the studied social networks reveals three general results (see Fig. 2): (1) For a fixed degree, there is a wide spread of M(k_S,k) values. In particular, there are many hubs located at the periphery of the network (large k, low k_S) that are poor spreaders. (2) For a fixed k_S, M(k_S,k) is approximately independent of the degree of the nodes. This result is revealed in the vertically layered structure of M(k_S,k), suggesting that infected nodes located in the same k shell produce similar epidemic outbreaks M(k_S,k) independent of the value of k of the infection origin. (3) The most efficient spreaders are located in the inner core of the network (large k_S region), fairly independently of their degree. These results indicate that the k-shell index of a node is a better predictor of spreading influence. When an outbreak starts in the core of the network (large k_S) there exist many pathways through which a virus can infect the rest of the network; this result is valid regardless of the node degree. The existence of these pathways implies that, during a typical epidemic outbreak from a random origin, nodes located in high-k_S layers are more likely to be infected and they will be infected earlier than other nodes (see Supplementary Section SIII). The neighbourhood of these nodes makes them more efficient in sustaining an infection in the early stages, thus enabling the epidemic to reach a critical mass such that it can fully develop. Similar results on the efficiency of high-k_S nodes are obtained from the analysis of M(k_S,C_B) in Fig. 2, where C_B is the betweenness centrality of a node in the network^9,10: the value of C_B is not a good predictor for spreading efficiency.

**Figure 2: The k-shell index predicts the outcome of spreading more reliably than the degree k or the betweenness centrality C_B.**

To quantify the importance of k_S in spreading we calculate the ‘imprecision functions’ ε_k_S (p), ε_k(p) and ε_C_B (p). These functions estimate for each of the three indicators k_S, k and C_B how close to the optimal spreading is the average spreading of the p N (0<p<1) chosen origins in each case (see Methods and Supplementary Section SIV). The strategy to predict the spreading efficiency of a node based on k_S is consistently more accurate than a method based on k in the studied p range (Fig. 3a). The C_B-based strategy gives poor results compared with the other two strategies.

**Figure 3: k-shell structure of the CNI network.**

Our finding is not specific to the social networks shown in Fig. 2. In Supplementary Section SV we analyse the spreading efficiency in other networks not social in origin, such as the Internet at the router level²⁰, with similar conclusions. The key insight of our finding is that in the studied networks a large number of hubs are located in the peripheral low-k_S layers (Fig. 3b shows the location of the 25 largest hubs in the CNI; see also Supplementary Section SV) and therefore contribute poorly to spreading. The existence of hubs in the periphery is a consequence of the rich topological structure of real networks. In contrast, in a fully random network obtained by randomly rewiring a real network preserving the degree of each node (such a random network corresponds to the configuration model²¹; see Supplementary Section SVI) all the hubs are placed in the core of the network (see the red scatter plot in Fig. 3c) and they contribute equally well to spreading. In such a randomized structure the same information is contained in the k shell as in the degree classification because there is a one-to-one relation between the two quantities, which is approximately linear, k_S∝k (Fig. 3c and Supplementary Fig. S13). Examples of real networks that are similar to a random structure are the network of product space of economic goods²² and the Internet at the AS level (analysed in Supplementary Section SV).

Our study highlights the importance of the relative location of a single spreading origin. Next, we address the question of the extent of an epidemic that starts at multiple origins simultaneously. Figure 3d shows the extent of SIR spreading in the CNI network when the outbreak simultaneously starts from the n nodes with the highest degree k or the highest k_S index. Even though the high-k_S nodes are the best single spreaders, in the case of multiple spreading the nodes with highest degree are more efficient than those with highest k_S. This result is attributed to the overlap of the infected areas of the different spreaders: large-k_S nodes tend to be clustered close to one another, whereas hubs can be more spread in the network and, in particular, they need not be connected with one another. Clearly, the step-like features in the plot of highest-k_S nodes (red solid curve in Fig. 3d) suggest that the infected percentage remains constant as long as the infected nodes belong in the same k shell. Including just one node from a different k shell results in a significantly increased spreading. This result suggests that a better spreading strategy using n spreaders is to choose either the highest-k or k_S nodes with the requirement that no two of the n spreaders are directly linked to each other. This scheme then provides the largest infected area of the network, as shown in Fig. 3d.

Many contagious infections, including most sexually transmitted infections²³, do not confer full immunity after infection as assumed in the SIR model, and therefore are suitably described by the SIS epidemic model, where an infectious node returns to the susceptible state with probability λ. In an SIS epidemic the number of infectious nodes eventually reaches a dynamic-equilibrium ‘endemic’ state, where as many infectious individuals become susceptible as susceptible nodes become infectious¹⁸. In contrast to SIR, in the initial state of our SIS simulations 20% of the network nodes are already infected. The spreading efficiency of a given node i in SIS spreading is the persistence, ρ_i(t), defined as the probability that node i is infected at time t (ref. 7). In an endemic SIS state, becomes independent of t (see Supplementary Section SVII). Previous studies have shown that the largest persistence is found in the network hubs, which are re-infected frequently owing to the large number of neighbours^7,24,25. However, we find that this result holds only in randomized network structures. In the real network topologies studied here, we find that viruses persist mainly in high-k_S layers instead, almost irrespectively of the degree of the nodes in the core.

In the case of random networks, it is found that viruses propagate to the entire network above an epidemic threshold given by β>β_c^rand≡λ〈k〉/〈k²〉 (refs 24, 26). In real networks, such as the CNI network, the threshold β_c is different from β_c^rand. Furthermore, in real networks, we find that viruses can survive locally even when β<β_c, but only within the high-k_S layers of the network, whereas virus persistence in peripheral k_S layers is negligible (Fig. 4a–c). As the k-shell structure depends on the network assortativity, the lower threshold is in agreement with the observation that high positive assortativity²⁷ may decrease the epidemic threshold.

**Figure 4: SIS spreading in the CNI network and β dependence for SIS and SIR.**

The importance of high-k_S nodes in SIS spreading is confirmed when we analyse the asymptotic probability that nodes of given (k_S,k) values will be infected. This probability is quantified by the persistence function

as a function of (k_S,k) at different β values (Fig. 4a and b). High-k_S layers in networks might be closely related to the concept of a core group in sexually transmitted infection research²³. The core groups are defined as subgroups in the general population characterized by high partner turnover rate and extensive intergroup interaction²³.

Similar to the core group, the dense subnetwork formed by nodes in the innermost k shells helps the virus to consistently survive locally in the inner-core area and infect other nodes adjacent to the area. These k shells preserve the existence of a virus, in contrast to, for example, isolated hubs at the periphery. Note that a virus cannot survive in the degree-preserving randomized version of the CNI network, owing to the absence of high-k shells.

The importance of the inner-core nodes in spreading is not influenced by the infection probability values, β. In both models, SIS and SIR, we find that the persistence ρ or the average infected fraction M, respectively, is systematically larger for nodes in inner k shells compared with nodes in outer k shells, over the entire β range that we studied (Fig. 4c,d). Thus, the k-shell measure is a robust indicator for the spreading efficiency of a node.

Finding the most accurate ranking of individual nodes for spreading in a population can influence the success of dissemination strategies. When spreading starts from a single node the k_S value is enough for this ranking, whereas in the case of many simultaneous origins spreading is greatly enhanced when we additionally repel the spreaders with large degree or k_S. In the case of infections that do not confer immunity on recovered individuals, the core of the network in the large-k_S layers forms a reservoir where infection can survive locally.

Methods

The k-shell decomposition.

Nodes are assigned to k shells according to their remaining degree, which is obtained by successive pruning of nodes with degree smaller than the k_S value of the current layer. We start by removing all nodes with degree k=1. After removing all the nodes with k=1, some nodes may be left with one link, so we continue pruning the system iteratively until there is no node left with k=1 in the network. The removed nodes, along with the corresponding links, form a k shell with index k_S=1. In a similar fashion, we iteratively remove the next k shell, k_S=2, and continue removing higher-k shells until all nodes are removed. As a result, each node is associated with one k_S index, and the network can be viewed as the union of all k shells. The resulting classification of a node can be very different than when the degree k is used.

The spreading models.

To study the spreading process we apply the SIR and SIS models. In the SIR model, all nodes are initially in the susceptible state (S) except for one node in the infectious state (I). At each time step, the I nodes infect their susceptible neighbours with probability β and then enter the recovered state (R), where they become immunized and cannot be infected again. The SIS model aims to describe spreading processes that do not confer immunity on recovered individuals: infected individuals still infect their neighbours with probability β but they return to the susceptible state with probability λ (here we use λ=0.8) and can be reinfected at subsequent time steps, and they remain infectious with probability 1−λ.

The imprecision function.

The betweenness centrality, C_B(i), of a node i is defined as follows: Consider two nodes s and t and the set σ_{s t} of all possible shortest paths between these two nodes. If the subset of this set that contains the paths that pass through the node i is denoted by σ_{s t}(i), then the betweenness centrality of this node is given by

where the sum runs over all nodes s and t in the network.

The imprecision function ε(p) quantifies the difference between the average spreading between the p N nodes (0<p<1) with highest k_S, k or C_B and the average spreading of the p N most efficient spreaders (N is the number of nodes in the network). Thus, it tests the merit of using k shell, k and C_B to identify the most efficient spreaders. For a given β value and a given fraction of the system p we first identify the set of the N p most efficient spreaders as measured by M_i (we designate this set by ϒ_eff). Similarly, we identify the N p individuals with the highest k-shell index (ϒ_k_S ). We define the imprecision of k-shell identification as ε_k_S (p)≡1−M_k_S /M_eff, where M_k_S and M_eff are the average infected percentages averaged over the ϒ_k_S and ϒ_eff groups of nodes respectively. ε_k and ε_C_B are defined similarly to ε_k_S .

References

Caldarelli, G. & Vespignani, A. (eds) Large Scale Structure and Dynamics of Complex Networks (World Scientific, 2007).
Anderson, R. M., May, R. M. & Anderson, B. Infectious Diseases of Humans: Dynamics and Control (Oxford Science Publications, 1992).
Google Scholar
Diekmann, O. & Heesterbeek, J. A. P. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation (Wiley Series in Mathematical & Computational Biology, 2000).
MATH Google Scholar
Keeling, M. J. & Rohani, P. Modeling Infectious Diseases in Humans and Animals (Princeton Univ. Press, 2008).
MATH Google Scholar
Rogers, E. M. Diffusion of Innovation 4th edn (Free Press, 1995).
Google Scholar
Albert, R., Jeong, H. & Barabási, A-L. Error and attack tolerance of complex networks. Nature 406, 378–482 (2000).
Article ADS Google Scholar
Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200–3203 (2001).
Article ADS Google Scholar
Cohen, R., Erez, K., ben-Avraham, D. & Havlin, S. Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 86, 3682–3685 (2001).
Article ADS Google Scholar
Freeman, L. C. Centrality in social networks: Conceptual clarification. Social Networks 1, 215–239 (1979).
Article Google Scholar
Friedkin, N. E. Theoretical foundations for centrality measures. Am. J. Sociology 96, 1478–1504 (1991).
Article Google Scholar
Bollobás, B. Graph Theory and Combinatorics: Proceedings of the Cambridge Combinatorial Conference in Honor of P. Erdös Vol. 35 (Academic, 1984).
Google Scholar
Seidman, S. B. Network structure and minimum degree. Social Networks 5, 269–287 (1983).
Article MathSciNet Google Scholar
Carmi, S., Havlin, S, Kirkpatrick, S., Shavitt, Y. & Shir, E. A model of Internet topology using k-shell decomposition. Proc. Natl Acad. Sci. USA 104, 11150–11154 (2007).
Article ADS Google Scholar
Ángeles-Serrano, M. & Boguñá, M. Clustering in complex networks. II. Percolation properties. Phys. Rev. E 74, 056116 (2006).
Article MathSciNet Google Scholar
LiveJournal, http://www.livejournal.com.
Liljeros, F., Giesecke, J. & Holme, P. The contact network of inpatients in a regional healthcare system. A longitudinal case study. Math. Population Studies 14, 269–284 (2007).
Article MathSciNet Google Scholar
The Internet Movie Database, http://www.imdb.com.
Hethcote, H. W. The mathematics of infectious diseases. SIAM Rev. 42, 599–653 (2000).
Article ADS MathSciNet Google Scholar
Castellano, C., Fortunato, S. & Loretto, V. Statistical Physics of Social Dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
Article ADS Google Scholar
Shavitt, Y. & Shir, E. DIMES: Let the internet measure itself. ACM SIGCOMM Comput. Commun. Rev. 35, 71–74 (2005).
Article Google Scholar
Molloy, M. & Reed, B. A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6, 161–180 (1995).
Article MathSciNet Google Scholar
Hidalgo, C. A., Klinger, B., Barabasi, A-L. & Hausmann, R. The product space conditions the development of nations. Science 317, 482–487 (2007).
Article ADS Google Scholar
Hethcote, H. & Rogers, J. A. Gonorrhea Transmission Dynamics and Control (Springer-Verlag, 1984).
Book Google Scholar
Pastor-Satorras, R. & Vespignani, A. Immunization of complex networks. Phys. Rev. E 65, 036104 (2002).
Article ADS Google Scholar
Dezsó, Z. & Barabási, A-L. Halting viruses in scale-free networks. Phys. Rev. E 65, 055103 (2002).
Article ADS Google Scholar
Cohen, R., Erez, K., ben-Avraham, D. & Havlin, S. Resilience of the Internet to random breakdowns. Phys. Rev. Lett. 85, 4626–4630 (2000).
Article ADS Google Scholar
Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
Article ADS Google Scholar
Large Network visualization tool, http://xavier.informatics.indiana.edu/lanet-vi/.
Alvarez-Hamelin, J. I., Dallásta, L., Barrat, A. & Vespignani, A. Large scale networks fingerprinting and visualization using the k-core decomposition. Adv. Neural Inform. Process. Systems 18, 41–51 (2006).
Google Scholar

Download references

Acknowledgements

We thank NSF-SES, NSF-EF, ONR, DTRA, Epiwork and the Israel Science Foundation for support. F.L. is supported by Riksbankens Jubileumsfond. We thank L. Braunstein, J. Brujić, kc claffy, D. Krioukov and C. Song for discussions and S. Zhou for providing the email dataset. The use of the hospital dataset was approved by the Regional Ethical Review Board in Stockholm (Record 2004=5:8).

Author information

Authors and Affiliations

Center for Polymer Studies and Physics Department, Boston University, Boston, Massachusetts 02215, USA
Maksim Kitsak & H. Eugene Stanley
Cooperative Association for Internet Data Analysis (CAIDA), University of California-San Diego, La Jolla, California 92093, USA
Maksim Kitsak
Levich Institute and Physics Department, City College of New York, New York, New York 10031, USA
Lazaros K. Gallos & Hernán A. Makse
Minerva Center and Department of Physics, Bar-Ilan University, Ramat Gan, Israel
Shlomo Havlin
Department of Sociology, Stockholm University, Stockholm, S-10691, Sweden
Fredrik Liljeros
Operations and Management Sciences Department, Information, Stern School of Business, New York University, New York, New York 10012, USA
Lev Muchnik

Authors

Maksim Kitsak
View author publications
You can also search for this author in PubMed Google Scholar
Lazaros K. Gallos
View author publications
You can also search for this author in PubMed Google Scholar
Shlomo Havlin
View author publications
You can also search for this author in PubMed Google Scholar
Fredrik Liljeros
View author publications
You can also search for this author in PubMed Google Scholar
Lev Muchnik
View author publications
You can also search for this author in PubMed Google Scholar
H. Eugene Stanley
View author publications
You can also search for this author in PubMed Google Scholar
Hernán A. Makse
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the work presented in this paper.

Corresponding author

Correspondence to Hernán A. Makse.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

Supplementary Information (PDF 1490 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitsak, M., Gallos, L., Havlin, S. et al. Identification of influential spreaders in complex networks. Nature Phys 6, 888–893 (2010). https://doi.org/10.1038/nphys1746

Download citation

Received: 21 January 2010
Accepted: 07 July 2010
Published: 29 August 2010
Issue Date: November 2010
DOI: https://doi.org/10.1038/nphys1746

This article is cited by

Identifying key players in complex networks via network entanglement
- Yiming Huang
- Hao Wang
- Linyuan Lü
Communications Physics (2024)
Predicting nodal influence via local iterative metrics
- Shilun Zhang
- Alan Hanjalic
- Huijuan Wang
Scientific Reports (2024)
DomiRank Centrality reveals structural fragility of complex networks via node dominance
- Marcus Engsig
- Alejandro Tejedor
- Chaouki Kasmi
Nature Communications (2024)
An efficient method for node ranking in complex networks by hybrid neighbourhood coreness
- Kushal Kanwar
- Sakshi Kaushal
- Manju Khari
Computing (2024)
Time-sensitive propagation values discount centrality measure
- Salman Mokhtarzadeh
- Behzad Zamani Dehkordi
- Ali Barati
Computing (2024)

Identification of influential spreaders in complex networks

Abstract

Similar content being viewed by others

Identifying influential spreaders in complex networks for disease spread and control

Identifying influential spreaders by gravity model

Systematic comparison between methods for the detection of influential spreaders in complex networks

Main

Methods

The k-shell decomposition.

The spreading models.

The imprecision function.

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Identifying key players in complex networks via network entanglement

Predicting nodal influence via local iterative metrics

DomiRank Centrality reveals structural fragility of complex networks via node dominance

An efficient method for node ranking in complex networks by hybrid neighbourhood coreness

Time-sensitive propagation values discount centrality measure

Search

Quick links

Abstract

Similar content being viewed by others

Main

Methods

The k-shell decomposition.

The spreading models.

The imprecision function.

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links