Abstract
Although computer viruses cause tremendous economic loss, defence mechanisms fail to adapt to their rapid evolution. Previous immunization strategies have been characterized as being static and centralized, which has made virus containment difficult or even impossible. We suggest, instead, to propagate the immunization agent as an epidemic. The main problem with epidemic vaccine propagation is that it is bound to lag behind the virus. We suggest giving the vaccine an advantage over the virus by allowing it to leapfrog through a separate, overlapping, partially correlated network. This enables the antivirus to contain the epidemic efficiently. We systemize this concept with a ‘honeypot’ architecture that achieves both early virus discovery and rapid antivirus dissemination. We present analytic, as well as simulation, results for a set of realistic topologies that illustrate the effectiveness of this approach.
Main
The realization that network models possess nontrivial properties^{1,2,3}, such as a diameter that grows logarithmically with network size^{4} and a nonexistent percolation threshold^{5}, implies that for predominant epidemic models the epidemic will not stop by immunizing any finite subset of nodes.
However, current immunization strategies^{6,7,8,9,10,11} focus on removing nodes from the network a priori by immunizing them before the epidemic outburst. In the absence of complete knowledge of the network topology, these strategies are confined to a random character. Thus, these strategies require, in most cases, the removal of almost all of the nodes, and in all cases^{7} the removal of at least a quarter of the nodes.
In contrast, we introduce a dynamic distributed immunization strategy, where the vaccine development and immunization processes depend on, and interact with, the virus dissemination process itself, thus creating a codependency between virus dissemination and immunization.
In the context of traditional biological epidemiology, there was little sense in considering dynamic, distributed immunization strategies. This is mainly owing to the fact that the timescale gap between epidemic outburst and vaccine creation is very large, and that there is no ‘infectious’ delivery mechanism available for biological vaccines.
The world of computer viruses has characteristics that are diametrically different. First, new viruses emerge at an increasing pace. Second, computer viruses are much less complex than their biological counterparts, and are much easier to analyse and to characterize^{12,13}. Thus, vaccine development can be achieved on a timescale comparable to that of the infection process. On the negative side, however, the viral process possesses an inherent leadtime advantage: it appears before the vaccine, as a new vaccine can be created only after the new virus has started percolating the network. This fact, in itself, imposes strong constrains^{14} on the usability of dynamic approaches. However, as we will demonstrate below, one can devise design principles that compensate for the leadtime advantage of the virus and that support the deployment of efficient dynamic immunization systems.
We discuss the concrete example of the email network. In this network, an email account constitutes a node whereas the directed edges of the network are the entries in the account’s address book. The virus spreads through the account address book with a timescale that ranges from several hours to a number of days. This timescale is an upper bound of the vaccine generation timescale for any effective epidemic containment solution. Studies^{9,15} show that this network’s degree distribution (that is, the distribution, P(k), which governs the probability that a node will have degree k, or, in other words have k edges attached to it) is very broad and can be modelled by a scalefree network, which is a network with a powerlawdegree distribution. We verified this through a survey of 502 individuals, which we also used to calibrate the parameters of our simulations.
In the past few years, virus spreading on such networks has been studied intensively using the static percolation framework^{11}. The dynamics that we introduce stem from this framework and allow for a richer set of effects.
Distributed immunization
In the present article we define a framework for the study of immunization strategies that react in real time to the emergence and propagation pattern of a virus. The objective is to find the strategies that minimize the size of the infected cluster. The size of the virus cluster is the portion of infected nodes after a time period that we take to infinity. The size of the immunized clusters is defined consistently as the aggregate number of immunized nodes. The underlying assumptions of our model are as follows. (a) As in the susceptible, infected, removed (SIR) epidemiologic model^{16,17}, a node can be in one of three modes with respect to a specific virus: susceptible, infected or immunized (removed).
However, unlike the SIR model, a node cannot change its mode once it is either infected or immunized during the relevant timescale. This model is in close agreement with the behaviour observed on the Internet today, as an increasing number of viruses shut down securityrelated software on infecting a new node. (b) An infected node releases the virus to all of its neighbours with a delay time that is either deterministic or stochastic. The virus is transmitted to all neighbours and not a stochastic subset, which makes fighting the virus harder. (c) Some nodes, in accordance with a given probability function, may recognize their own infection, identify its characteristics and create an immunization agent, as the infection process progresses^{13,18}. The agent then spreads to all neighbours and immunizes the susceptible ones. In addition, we define: (i) the average infection delay, also known as the diseasegeneration time, t_{inf}, as the average time required by a virus in a given node to infect a neighbouring node; (ii) the average immunization delay, t_{imm}, as the average time required by an immunization agent in a given node to immunize a neighbouring node; (iii) the average development delay, t_{dev}, as the average time required by an infected node to develop an immunization agent.
In essence, the described dynamics involve a competition between two types of branching process on a network^{19}, where the first type creates a connected virus cluster and the second creates a collection of immunized clusters. Unlike centralized approaches, this approach nullifies the need for a global knowledge of the topology. We consider the deterministic case, where the various delays associated with the infection, agent creation and immunization are all constant, and where all neighbouring nodes become infected or immunized simultaneously. The resulting dynamics show a sharp transition at the point where t_{inf}=t_{imm}+t_{dev}. In the deterministic case, when t_{inf} is below this threshold, the virus infects the entire network, whereas when above it, the dynamics are governed by the agent development pace and the network topology characteristics. This sharp transition around the critical value also remains true when the delays are stochastic variables. In the discretetime simulations we present below, all of the time parameters equal one time step, which in turn gives the virus a head start of one time step. As presented in Fig. 1, this difference is enough to let the virus infect the entire network when the virus and immunization agent spread on the same network.
Partially correlated networks
To unleash the potential of the immunization system, we offer a slight modification to the problem by introducing a relatively small number of edges to the network topology. These immunization edges, which are used exclusively by the immunization agents, have a dramatic effect on the ability of a dynamic scheme to contain the virus by offering access to a parallel network with identical nodes and almost identical edges as the original network. These edges connect the node that produced the immunization agent to nodes that are beyond its immediate neighbourhood as defined by the initial network. In our example, the parallel network is the phonebook network, which is strongly correlated with the email network.
The study of networks that connect the same nodes but have different sets of links is only in its infancy^{20,21}; however, even now it is clear that such networks are qualitatively different from each of their components taken separately. This is owing to the complete change in the topology and the metrics that are induced in each of the networks through their interaction. In practical terms, this means that the immunization agents are effectively deployed ‘behind enemy lines’, unconstrained by the boundaries of the surrounding virus cluster. Once in this position, they can alter the topology of the space remaining at the virus’s disposal by immunizing nodes that otherwise would have belonged to the infected cluster. In Fig. 1 we illustrate the difference between a network with no extra immunization edges and a network that does possess several of these edges. The difference in the dynamics is further illustrated in Supplementary Information, Video S1.
The effect of introducing extra immunization edges, along with the original network, amounts to the generation of a pair of partially correlated networks^{22,23}, which we define as follows: two given networks G_{1}=(V,E_{1}),G_{2}=(V,E_{2}) are partially correlated with overlap p if is greater than zero. Here, V represents the set of nodes in a network and E the number of edges.
Starting with our initial network G_{1}, we create a new network G_{2} for the immunizing agent by adding to G_{1} a set of edges e_{1} that do not belong to G_{1}. Using the relative edge addition, q=e_{1}/E_{1}, the overlap will be
We then alter the distributed immunization dynamics in the following way. The virus spreads through the original network G_{1}, whereas the immunizing agent is deployed through the partially correlated network G_{2}, which is obtained by randomly adding qE_{1} edges to G_{1}. By doing so, we enable the immunizing agent to break through the virus cluster and to immunize the network.
In the Methods section we show analytically that for the discretetime deterministic model the relative size of the infected cluster (that is, the ratio of infected to immunized clusters), as a function of the relative edge addition q, has a powerlaw upper bound with a −1 power exponent.
In addition, we have studied the problem through network simulations. In Fig. 2 we present simulation results that show a powerlaw ratio dependence with an exponent that approaches −4/3.
Thus, we can conclude that dynamic immunization, which is used over partially correlated networks, can reduce the size of a virus cluster considerably with negligible costs.
Honey pots
To systemize and improve our scheme we present the honeypot architecture^{13} (the name originating from their function as traps). This architecture has two main benefits over the random solution. First, it is much more realistic and technically feasible. Second, it is considerably more efficient than a random deployment of immunizing edges, and, given the same immunization edge budget, it minimizes the virus cluster to sizes that can be as small as a fourth of the respective cluster in the randomedge case. These features make this architecture an attractive alternative to current immune systems.
The aim of this architecture is to introduce a virtual superhub that transforms the shortcomings of a scalefree network, which is considerably impaired when its largest hubs are removed^{3}, into an advantage.
The honeypot architecture is constructed in the following manner. We exclusively implant the ability to develop an immunizing agent into a set of nodes called honey pots. The honey pots are embedded randomly within the network such that any virus that spreads through the network will be likely to reach them promptly. Finally, all honey pots are connected in a complete graph topology using special edges that only allow the immunizing agents to traverse along them.
Initially, the virus spreads freely, until it infects the first honey pot and thus triggers an immunization agent development process. By this time, the expected size of the virus cluster equals the size of the network divided by the number of honey pots. As the virus continues to spread, all honey pots are informed of the new virus, and each honey pot then begins to function as the root of a separate infectious immunization process. The honey pots have the effect of a superhub, with a degree that is the sum of the degrees of the separate honey pots.
In the Methods section we calculate an upper bound for the relative virus cluster using the honeypot architecture. We show that if the number of honey pots, as a function of the network size N, f(N), grows faster than , the size of the virus cluster will approach a zero portion of the network, as the network size approaches infinity. In the case where f(N)=β N (where β is some finite constant), we get
where α is a characteristic constant of the topology, V_{t}(N) is the size of the virus cluster and A_{t}(N) is the aggregate size of the immunized clusters after time step t, as a function of network size N. When , the relative special edge addition due to the honey pots is kept constant in the infinite size limit. This analytic estimation is validated through simulations, and is presented in Figs 3 and 4 and illustrated in Supplementary Information, Video S2. Comparing the random architecture with the honeypot architecture, we observe in Fig. 5 a significant improvement owing to the honey pots that grows with network size.
In Fig. 6 we address the question of robustness of these approaches to different topology characteristics, presenting an analysis of the effect of varying the degree distribution power exponent on the virus cluster; we show that it is minor compared with the dependence on the immunization edge density.
Deployment and feasibility
Faced with the systematic defeats in the war against computer viruses, a paradigm shift may be required. We propose such a shift from the current, static, centralized immunization strategies to a dynamicdistributedimmunesystem approach. We demonstrate the effectiveness of such an architecture in protecting large networks, both when built randomly or when designed artificially. Although the presentation of a practical system design is outside the scope of this article, such a system is certainly deployable. In the past, it has been shown that distributed systems that monitor the Internet in real time^{24,25} are not only feasible but are also very effective. Shifting the focus of an antivirus system from cleaning a single machine to containing the epidemic enables the introduction of accurate automatic triggering within a timescale of less than a minute, which allows such a system to surpass the t_{inf}=t_{imm}+t_{dev} barrier. This enables the system to compete with and defeat the spread of the epidemic. The architectures we have presented constitute a starting point that can be further improved, for example, by designing algorithms for the placement of the honey pots^{7}.
Methods
Analysis of the randomedge effect
Given a network with degree distribution P(k), let us calculate an upper bound on the rate of growth of the virus cluster, V_{t}(N), where N is the network’s size and t is the time index. Let us examine the portion of the t+1 time layer, l_{t+1}, with degree k:
where m is the average node degree, k ′ is a summation index over all degrees and C holds the topological clustering properties of the network, which reduce the number of effective neighbours. Even though, in general, C may be a complicated expression, and may also depend on k, in our meanfield approximation we refer to it as a constant of the topology. As the sum does not depend on k, we can calculate it independently and call it a_{t+1}. Then, l_{t+1}(k)=a_{t+1}k P(k). Substituting in (2), gives us
We call the outcome of the new sum α. As it does not depend on k, we find that l_{t+1}=l_{t}·α. If α is larger than 1 we get an exponential growth. If N is large enough so that finitesize effects are irrelevant we get
Let us turn to the immunized cluster(s). Define A_{t}(N) to be the aggregate size of the immunized clusters at time step t as a function of N. Given a relative edge addition q and an average node degree m, the expected number of immune specific edges is q m, and each may initiate an immunized cluster. Once started, the immunized clusters also grow with ratio α. Thus, A_{t} when N is large enough is
which can be compacted to
The ratio of the size of the virus cluster to that of the immunized clusters is
from which we get an upperbound (as all our assumptions were in favour of the virus cluster), powerlaw dependence with exponent −1 on q for the discretetime, deterministic model.
Analysis of the honeypot architecture effect
We would like to calculate an upper bound for the ratio V_{t}(N)/A_{t}(N) for very large Ns and when t approaches infinity.
Let us assume that there are f(N) honey pots distributed randomly in the network, all connected in a complete graph using immunization edges. Then, clearly, the expected virus cluster size when a honey pot is infected with a virus for the first time is N/f(N). At that time, the boundary outside the virus cluster will have (N/f(N))(α−1) nodes. At the next time step, f(N) nodes will be ‘infected’ with the immunization agent. From this point forward, we assume that (in the deterministic case) the virus cluster and the immunized clusters grow as an uninterrupted geometric series. Then, with increasing t, their ratio approaches
From this equation we can see that whenever f(N) grows faster than the size of the virus cluster will approach a zero portion of the network, as the network size approaches infinity. In the case where f(N)=β N, we get
which means that we have a powerlaw relation between this ratio and the relative amount of honeypot nodes, β, with an exponent equal to −2. This result is not surprising, as is the function for which the relative special edge addition due to the honey pots is kept constant in the infinite size limit.
References
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘smallworld’ networks. Nature 393, 440–442 (1998).
Albert, R., Jeong, H. & Barabasi, A.L. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Albert, R., Jeong, H. & Barabasi, A.L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).
Chung, F. & Lu, L. The average distances in random graphs with given expected degrees. Proc. Natl Acad. Sci. USA 99, 15879–15882 (2002).
Cohen, R., Erez, K., benAvraham, D. & Havlin, S. Resilience of the internet to random breakdowns. Phys. Rev. Lett. 85, 4626–4628 (2000).
PastorSatorras, R. & Vespignani, A. Immunization of complex networks. Phys. Rev. E 65, 036104 (2002).
Havlin, S., Cohen, R. & benAvraham, D. Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett. 91, 247901 (2003).
Dezso, Z. & Barabasi, A.L. Halting viruses in scalefree networks. Phys. Rev. E 65, 055103 (2002).
Newman, M., Forrest, S. & Balthorp, J. Email networks and the spread of computer viruses. Phys. Rev. E 66, 035101 (2002).
Zou, C. C., Gong, W. & Towsley, D. in The 9th ACM Conference on Computer and Communications Security 138–147 (ACM, Washington, 2002).
PastorSatorras, R. & Vespignani, A. Epidemic spreading in scalefree networks. Phys. Rev. Lett. 86, 3200–3203 (2001).
Kephart, J., Sorkin, G., Swimmer, M. & White, S. Blueprint For a Computer Immune System Ch. 13, 242–261 (Springer, New York, 1999).
Kreibich, C. & Crowcroft, J. Honeycombcreating intrusion detection signatures using honeypots. Comput. Commun. Rev. 34, 51–56 (2004).
Moore, D., Shannon, C., Voelker, G. & Savage, S. in IEEE Infocom 2003 (IEEE, Piscataway, New Jersey, 2003).
Ebel, H., Mielsch, L.I. & Bornholdt, S. Scalefree topology of email networks. Phys. Rev. E 66, 035103 (2002).
Newman, M. Spread of epidemic disease on networks. Phys. Rev. E 66, 016128 (2002).
May, R. M. & Lloyd, A. L. Infection dynamics on scalefree networks. Phys. Rev. E 64, 066112 (2001).
Kephart, J. & Arnold, W. C. in The 4th Virus Bulletin International Conference 1994 179–194 (Virus Bulletin, Abingdon, 1994).
Huang, Z.F. Selforganized model of information spread in financial markets. Eur. Phys. J. B 16, 379–385 (2000).
Erez, T., Hohnisch, M. & Solomon, S. in Economics: Complex Windows (eds Salzano, M. & Kirman, A.) 201–216 (Springer, New York, 2005).
Palla, G., Derényi, I., Farkas, I. & Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005).
Malarz, K. Social phase transition in Solomon network. Int. J. Mod. Phys. C 14, 561–565 (2003).
Chen, L.C. & Carley, K. M. The impact of countermeasure propagation on the prevalence of computer viruses. IEEE Trans. Syst. Man Cybernet. B 34, 823–833 (2004).
Buchanan, M. Databots chart the internet. Science 308, 813 (2005).
Shavitt, Y. & Shir, E. DIMES: Let the internet measure itself. ACM Comput. Commun. Rev. 35, 71–74 (2005).
Acknowledgements
This work was supported by a grant from the Israel Science Foundation. E.S. was partially supported by the ‘Yeshaya Horowitz Association through the Center for Complexity Science’.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Goldenberg, J., Shavitt, Y., Shir, E. et al. Distributive immunization of networks against viruses using the ‘honeypot’ architecture. Nature Phys 1, 184–188 (2005). https://doi.org/10.1038/nphys177
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nphys177
This article is cited by

A Review on Malware Analysis for IoT and Android System
SN Computer Science (2022)

On the effectiveness of tracking and testing in SEIR models for improving health vs. economy tradeoffs
Scientific Reports (2021)

Social network design for inducing effort
Quantitative Marketing and Economics (2020)

Modeling infection methods of computer malware in the presence of vaccinations using epidemiological models: an analysis of realworld data
International Journal of Data Science and Analytics (2020)

Modelling dynamical processes in complex sociotechnical systems
Nature Physics (2012)