Abstract
One of the most important advancements in theoretical epidemiology has been the development of methods that account for realistic host population structure. The central finding is that heterogeneity in contact networks, such as the presence of ‘superspreaders’, accelerates infectious disease spread in real epidemics. Disease control is also complicated by the continuous evolution of pathogens in response to changing environments and medical interventions. It remains unclear, however, how population structure influences these adaptive processes. Here we examine the evolution of infectious disease in empirical and theoretical networks. We show that the heterogeneity in contact structure, which facilitates the spread of a single disease, surprisingly renders a resident strain more resilient to invasion by new variants. Our results suggest that many host contact structures suppress invasion of new strains and may slow disease adaptation. These findings are important to the natural history of disease evolution and the spread of drugresistant strains.
Introduction
Our ability to understand and control the spread of infectious diseases has historically relied on insights gained through mathematical modelling^{1}. A large body of literature has examined the effects of both contact structure and pathogen evolution on disease spread. On the one hand, recent models that account for realistic host population structure have shown that heterogeneity in host contact networks strongly influences the epidemiology of an infectious disease^{2,3,4,5}. The presence of a few individuals with a disproportionately large number of contacts has been shown to significantly increase disease spread^{3,4,5}, and these ‘superspreaders’ have been shown to be important drivers in many real disease outbreaks^{6}. On the other hand, the constant and rapid evolution of pathogens in response to changing environments^{7,8} and medical interventions^{9} complicates disease control.
The intersection of these two fields, however, has received much less attention, although there are several reasons to think that contact structure can influence disease evolution. First, population structure can either amplify or suppress selection in simple populationgenetic models^{10}, but it is unclear to what extent these effects can be generalized to more complex infection dynamics. Second, models that have looked at the successive spread of two strains on a heterogeneous contact network have shown that the spread of the first strain modifies the network in a manner that may affect the spread of the second strain^{11,12,13,14}. Third, local contact heterogeneity arising from spatial structure has been shown to affect the evolution of pathogen virulence both in theoretical^{15,16,17} and experimental investigations^{18}. Given these findings, it is important to scrutinize in greater detail how contact structure influences the evolution of infectious diseases, and, moreover, whether there are particular contact networks that promote or hinder the invasion of new disease strains.
To this end, we use analytical and simulation methods to explore disease evolution in both empirical and idealized contact networks. We show that the heterogeneity in the host contact network that facilitates the spread of a single disease in turn lowers the fixation probability of an invading strain. Thus, many host contact structures may suppress the invasion of new disease strains and may slow disease evolution and adaptation.
Results
Simulation of disease evolution on networks
We gathered a set of empirically observed contact networks from different populations. Details of the networks, including sources, statistics and generative models, are given in Table 1 and the Methods. In brief, we use (i) a physical proximity network for students in a US elementary school; (ii) data from patients and healthcare workers in a US hospital; (iii) a survey of the number of sexual contacts from the United Kingdom National Survey of Sexual Attitudes and Lifestyles (NATSAL); and (iv) a social network consisting of friends, family and coworkers from the US Framingham Heart Study (FHS). We contrasted these networks with a set of wellcharacterized theoretical networks including uniform random, ErdösRényí, scalefree and smallworld. We characterize these networks using standard summary statistics from graph theory. An individual is said to have degree k if it is connected to k other individuals in the population, and the distribution of individuals’ degrees is given by the degree distribution, p(k).
We simulated epidemics on these networks using a stochastic susceptible—infected—susceptible (SIS) model, which is a simplified representation of an endemic disease without lasting immunity^{1,19}. We examine a series of steps in an epidemic caused by an evolving disease. In the first step (Fig. 1a), a single infected individual appears in the population due to, for example, migration from another population or infection from an external reservoir. Susceptible neighbours are infected with rate β_{1} and infected individuals recover with rate γ. The disease will either spread and reach an endemic equilibrium or go extinct. The probability that the disease does not immediately go extinct is the ‘emergence probability’, which is generally zero below threshold values of infectivity, β_{1}, infectious period, 1/γ, and contact density. In the second step (Fig. 1b), the disease reaches an endemic equilibrium, where the prevalence remains approximately constant for long periods of time. The particular structure of the contact network is a strong determinant of prevalence patterns, such as number and degree distribution of infected individuals. In a third step (Fig. 1c), a second strain of the disease appears in a random infected individual. We assume that the mutation rate is sufficiently low such that the resident disease has time to reach an endemic level before a new mutant appears. The second strain infects susceptible neighbours with rate β_{2} and recovers with rate γ. We assume the competing strains induce perfect crossimmunity, such that there is no coinfection or superinfection. We are interested in the likelihood that a new strain takes over the population, causing the resident strain to go extinct. The second and third steps may repeat continually over the course of evolution.
Selection exponents for empirical and theoretical networks
We found that the threshold infectivity required for the spread of a single strain differed strongly depending on the network structure (Fig. 2c,e). Heterogeneity in degree distribution facilitated disease spread, in agreement with previous work. The empirical networks generally had lower thresholds than the theoretical networks, apart from the scalefree network. We then analysed the ability of new strains to invade these networks (Fig. 2d,f). We define the fixation probability, P_{fix}, as the proportion of simulations in which the new strain invades and drives the resident strain to extinction. Again, we observe large differences between networks, in this case in the dependence of fixation probability on the selective advantage, r=β_{2}/β_{1}. Contact structure lowered the fixation probabilities and thus inhibited the emergence of new disease strains. This finding is surprising given that all networks apart from the smallworld facilitated the spread of an initial strain compared with a wellmixed population, by lowering the invasion threshold. Although this trend can also be observed in individual comparisons of two networks, it is not universal. For example, the uniform and smallworld have different invasion thresholds but similar fixation probabilities, while the school and hospital networks have similar invasion thresholds but very different fixation probabilities. The fixation probability of the second strain can be written as a function of the selective advantage r, which is amplified or suppressed by a selection exponent α, such that P_{fix}=1−1/r^{α} (see Methods and Lieberman et al.^{10}). For uniform random networks, we expect α=1. Networks can thus be ranked based on their ability to promote selection of new strains by estimating a bestfit α from the curves in Fig. 2d,f. Interestingly, all the tested networks have an α<1 and thus suppress selection of beneficial mutants compared with the wellmixed or uniform case (Fig. 2g, Supplementary Table 1).
Effect of degree heterogeneity on the fixation probability
Our goal is to understand the specific network properties that cause such disparate behaviours in disease evolution, but it is complicated by the fact that these example networks differ in many structural properties (Table 1). We thus examined classes of networks where single properties can be tuned.
Previous studies have identified individual variation in number of contacts as an important determinant of disease spread^{1,3,20}. To examine the influence of degree heterogeneity on disease evolution, we constructed a series of networks with the same mean degree but tunable variance (Fig. 3a). We then compared various aspects of the simulation results with analytical approximations. In line with previous work, the threshold disease transmissibility required for emergence is lower for networks with higher variance in connectivity^{3,20}. The emergence probabilities observed in the simulations (Fig. 3b) are wellapproximated by a continuoustime multitype branching process (detailed in Methods), where individuals are divided into types according to their degree. The probability of being infected at equilibrium as a function of the degree (Fig. 3c) is well described by a system of differential equations tracking pairs of individuals (see Methods and House and Keeling^{21}).
Next, we examined the probability, P_{fix}, that a novel strain, which appears at endemic equilibrium, displaces the resident strain. We find that this fixation probability depends highly on the network structure, and that it is markedly lower for networks with high variance in degree (Fig. 3d). Lower fixation probability results in slower adaptation when mutations are rare. Hence, heterogeneous contact structure acts to suppress selection for infectious diseases, despite facilitating initial spread.
We derive a combined analytical technique to approximate the invasion of a new disease strain into a population with a resident endemic disease, without the need for largescale simulations (Methods). We first obtain the fraction of individuals with degree k who are susceptible at equilibrium, using a deterministic pairwise approximation^{21}. We then calculate the invasion probability of the second strain using a branching process approach. This calculation is similar to the single strain case, but incorporates the probability that an individual of degree k is susceptible at equilibrium. This combined pairwise deterministic and multitype branching process approximation is in excellent agreement with simulation results (Fig. 3d).
We also examined the effect of the degree of the focal individual in which the new strain arises on the fixation probability. The fixation probability is positively correlated with the degree of the focal individual and negatively correlated with the average degree of this individual’s neighbours.
Effect of local clustering on the fixation probability
We next investigated the effect of local clustering on disease evolution. To this end, we constructed a series of smallworld networks with fixed degree and tunable global clustering coefficient, φ (see Methods for definition and details). In brief, individuals in the network are initially connected to a local neighbourhood, after which a rewiring procedure is applied that introduces shortcuts in the network^{22}. The emergence probability and endemic equilibrium for a single strain decrease with increasing clustering (Fig. 3e,f), in agreement with previous work^{22,23}. Surprisingly, however, we find that the fixation probability of the second strain is completely independent of clustering (Fig. 3g).
Our results suggest that degree heterogeneity more strongly influences disease evolution than local clustering. The intuition behind these results is illustrated in Fig. 4. For a first strain spreading in a fully susceptible population, high variance in degree facilitates spread due to the the presence of easily accessible hubs with high connectivity. These hubs, once infected, markedly reduce the extinction probability as they are surrounded by many susceptible individuals. Starlike graphs are an extreme example of this situation (Fig. 4a). When a novel mutant arises in endemically infected populations, however, the hubindividuals are the most likely to already be infected by the resident strain, and hence are unlikely to be available for infection by the mutant. The higher the variance, the stronger this hubholding effect, and the lower the average degree of remaining susceptible individuals. As there are many more peripheral than centre nodes in the star graph, a randomly introduced new strain is more likely to appear in a peripheral node (Fig. 4b). To spread, however, the new strain must first infect a centre node, which is very likely already infected with the resident strain. If the centre node recovers before the mutant goes extinct, the mutant has the possibility to infect it; however, it must do so before the centre is reinfected by the resident strain from one of the many other infected peripheral nodes (Fig. 4c).
To understand the lack of influence of local clustering, we consider a smallworld network (Fig. 4d–f). When the first strain spreads in a fully susceptible population, patient zero has the possibility to infect all of its neighbours. Strong local clustering implies strong overlap in neighbours of two connected individuals. As the epidemic progresses, the neighbours of subsequently infected nodes are likely to already be infected. Consequently, smallworld networks with large clustering coefficients tend to decrease the probability that a disease will emerge. In fully susceptible populations, shortcuts facilitate disease spread by allowing the strain to jump to new areas of the network where most individuals are still susceptible. These shortcuts do not help the spread of a second strain, as jumping to a different part of the network is not beneficial when the resident disease is endemic throughout the network. Hence, the fixation probability of new strains is independent of the rewiring probability.
This intuitive argumentation can also be formulated in terms of the effect the first strain has on the degree distribution of susceptible individuals. As individuals with high degree are more likely to be infected at equilibrium (Fig. 3c), the mean degree in the residual network of susceptibles^{5,24} decreases with increasing variance (Fig. 4g,h). For uniform networks, the variance in degrees can only increase between initial and residual networks, while for more heterogeneous networks the residual network could have lower variance. Note that, in the SIS model, as recovery and reinfection are constantly occurring, the residual network is a dynamic concept: the actual nodes in it may change while the average properties remain constant.
Discussion
Throughout this paper, we have considered one particular twostep model of competition between different strains of an infectious disease spreading on a static contact network. A single pathogen strain obeying SIS dynamics spreads in a host population until it reaches endemic equilibrium. The probability of successful spread increases with increasing degree heterogeneity of the host population. In the endemic equilibrium, a new strain with complete crossimmunity, differing only in its transmission rate, appears in a single infected individual. Conversely to the initial spread, the probability that this new strain can successfully invade the host population decreases with increasing degree heterogeneity. Such a model is a good description of an endemic disease where the transmission of denovo mutants is a rare event. To understand in what way the results may be generalizable to other models of disease spread, it is important to discuss some of implications of the model assumptions.
The SIS model used in this paper is the simplest mathematical model of an endemic disease^{1}. Endemic diseases are at an increased risk of continual evolution when compared with singlewave epidemics, with the latter being better described by variations of the susceptible—infected—removed (SIR) models. The type of analysis presented here is not appropriate for such singlewave outbreaks, due to the absence of an endemic state. The effect of spatial structure on two subsequent epidemic waves of new strains in SIRtype models has been considered elsewhere^{11,12,14}. For an endemic disease with temporary immunity that can be described by a susceptible—infected—removed—susceptible (SIRS) model, we hypothesize that the same general trends we see for the SIS apply. Temporarily recovered individuals are not available to reinfection by either strain. From the pointofview of the invading mutant strain, the hubholding effect of infected individuals also applies to temporarily recovered individuals, thus hindering fixation in populations with high variance in degree.
In this study we considered beneficial mutant strains with increased transmissibility, β. Alternatively, a longer infectious period (smaller γ) would also convey a benefit in wellmixed populations^{16}. We repeated our analysis for a second strain with a smaller recovery rate, and found that the general trends with regards to degree heterogeneity were identical, as expected from the analytical results (Supplementary Fig. 1). However, we also observed that the fixation probability and selection coefficient were consistently higher when γ_{2} was varied as opposed to β_{2}, keeping the ratio constant. This effect is likely related to our previous results that, in small, wellmixed populations, when both transmissibility and recovery can vary independently, the direction of selection is shifted towards decreasing the recovery rate, as opposed to simply increasing R_{0} (ref. 25). Network structure may skew selection in a similar way as small populations. Further complications arise in models for the evolution of virulence, where β and γ are not independent. Previous work on the evolution of virulence has shown that the evolutionarily optimal virulence level is different in structure populations as compared with wellmixed populations^{15,16,17,26,27}. This work, however, has not investigated the specific network properties that modulate these trends. Future work will be needed to fully understand all the factors influencing disease evolution in more realistic scenarios.
We chose to model the introduction of a new disease strain by randomly choosing a single individual who was infected with the first strain during the endemic phase, and instantaneously switching their status to infected with the second disease strain. With this procedure, we aim to simulate the situation where withinhost evolution of a pathogen leads to a novel strain emerging. Alternatively, new disease strains could be introduced into a population from an external reservoir or another disconnected population. In this case, it may be more realistic to consider introduction into a random susceptible individual, or any random individual. As susceptible and infected individuals are spatially clustered within a contact network at endemic equilibrium^{23}, changing these initial conditions can affect the emergence probability. We verified that our results are qualitatively identical with this alternate initial condition, and can similarly be wellapproximated analytically, with the appropriate updates to equations (18) and (23).
In our simulations we consider perfect crossimmunity or competitive exclusion within a host. That is, being infected with one strain protects against infection by the other strain. Alternatively, hosts can be simultaneously infected with multiple strains (coinfection) or the strain currently infecting a particular host can be displaced by infection with another strain (superinfection). The effect of imperfect crossimmunity will strongly depend on the way in which the two strains influence each other’s infection and recovery rates, and will vary depending on the particular realworld disease considered. Perfect crossimmunity implies that infection with one strain completely blocks infections by the other strain. Conversely, complete lack of crossimmunity implies that the two strains do not influence each other’s infection rates. In this case, the invasion dynamics of the second strain will be equivalent to the case where the first strain is absent. The effect of contact structure on the invasion of the second strain with partial crossimmunity will thus be in between these two extremes.
In this paper, we have modelled population structure as a static, unweighted network. Contacts between individuals in many realworld situations, however, are dynamic. For example, infected individuals may stay at home or may be quarantined when infected, or move from home and the workplace to a hospital. If contacts between individuals are updated in a manner that is independent of disease status, we expect to find qualitatively similar results as in the static case. Such rewiring may dampen the effect of population structure as it either changes the instantaneous degree distribution of the network or maintains the degree distribution but reassorts neighbours. The effect of such rewiring will also depend on the timescales at which contacts are updated in comparison with the timescale of disease spread. Previous work by Cross et al.^{28} demonstrated that, for a metapopulation with intergroup migration, the spread of a single disease depended critically on the relative timescales of recovery and migration, and it is likely that disease evolution in this context may be similarly influenced by these parameters.
By considering competition between at most two strains, and requiring that the second strain arise only after the first is at a quasisteadystate, we have implicitly assumed that there is a separation of timescales between the epidemic and evolutionary processes. In our analysis, we focused only on the ultimate probability of fixation, and not the time required to reach fixation, which may be very long for certain population structures (those with more local and less global connections and thus higher clustering coefficients), and when strains are close in fitness. If we relaxed the separation of timescales assumption and instead allowed mutations to occur at a constant rate in each infected individual, we may observe situations where multiple strains coexist for very long periods of time. Other work has focused on the role of population structure in facilitating pathogen diversity in this regime^{29,30}.
In conclusion, we show that heterogeneity in contact structure suppresses disease evolution by lowering the fixation probability of any newly arising disease strains. This finding is surprising in two ways. First, the suppressive effect on evolution is in contrast to the wellestablished finding that contact heterogeneity otherwise facilitates the initial spread of a disease^{2,3,4,5}. However, the effect makes sense in light of the earlier finding that the initial strain modifies the residual network of susceptibles^{11,12,13,14}. Second, the suppressive effect is also in contrast to the earlier finding that certain network structures can amplify selection^{10}. This discrepancy arises from the differences in the underlying population dynamic model used to consider competition between two genotypes. Previous work used the Moran process model of reproduction and death, which considers only two types of individuals, while we use an infectious disease model that requires tracking susceptibles along with two types of infecteds. Our results highlight the fact that findings from the Moran model, such as the universality of fixation probabilities in isothermal graphs^{10,31}, may have little bearing on infectious disease dynamics. Despite the inherent challenges, understanding the interaction between disease emergence, evolution and contact structure is highly relevant to infectious disease epidemiology, as continual evolution is a major barrier to control, and interventions that target contact structure are increasingly popular.
Methods
Simulation details
All simulations were implemented as a Gillespie nextreaction method. For singledisease simulations, the infection is introduced into one random individual, and the simulation is run until the disease is extinct or reaches a quasisteady state (QSS), or t_{max}=1,000, whichever occurs first. QSS is defined when there is <2% difference between the average prevalence over the last third of the total simulation time and the middle third, after an initial burn in period of t_{burn}=100/γ. For the smallworld network with no rewiring, it was necessary to increase t_{max} to 10,000 and t_{burn} to 1,000/γ. The emergence probability is calculated as the fraction of runs out of where the disease does not go extinct. At least 7,000 runs were simulated for each parameter value.
For the twostrain invasion simulations, the resident strain is first introduced at a high level to avoid early extinction and allowed to reach a QSS (waiting at least t_{burn}). Then a singleinfected individual is randomly chosen to be infected with the mutant strain. The fixation probability is calculated as the mean fraction of invasion attempts where the resident strain goes extinct while the invading strain still remains. Runs where both disease strains remained after t_{max} were rare and not included in the reported results. New networks were randomly generated for each simulation run, resulting in at least 6,000 invasion attempts per parameter.
The value of β_{1} at which the mutant strain is introduced was chosen so that the QSS level was approximately equal for all networks. For Fig. 2d,f (empirical and theoretical networks) and Fig. 3g (smallworld networks), we used β_{1}(‹k›−1)/γ_{1}=3, and for Figs 3d and 4 h (gammadistributed networks), we used β_{1}(‹k›−1)/γ_{1}=1.5. Changing these values did not change the trends observed unless the QSS level was very different between networks.
Clustering coefficient
The clustering coefficient, φ, is also known as ‘global clustering coefficient’ or ‘transitivity’^{2,23}. It is defined as the ratio of the number of triangles in the network (sets of three nodes each connected to each other) to the number of triplets (set of three nodes with at least two connections between them). If A is the adjacency matrix of the network, then
Network generation algorithms
For the uniform network, all individuals have the same degree. We use the configuration model, expressed with a stubconnection algorithm, to create random graphs with a specified degree distribution. By randomly connecting individuals we reduce higherorder structure^{32}. For each node we first assign it a degree k, and then create a set of k stubs that represent each of these edges with only a single tail connected to a node. We repeat this for all nodes and then combine these stubs into a master set. This set is then randomly divided in half, and a stub from each subset is matched to one from the other subset, forming a complete edge. If there is an uneven number of stubs, a random individual is given an extra stub. We do not allow self connections or duplicate edges between nodes.
For the gammadistributeddegree network, each individual is assigned a degree drawn from a discretized version of the gamma distribution with a mean degree ‹k› and s.d. σ_{k}. The gamma distribution was chosen because it allows the mean and variance to be varied independently, with any variance between zero and infinity possible. Discretization was performed by first drawing a random number from a continuous gamma distribution with mean ‹k›−1 and s.d. σ_{k}, rounding to the nearest integer, and then adding 1. It was confirmed numerically that this created a distribution with the desired properties over the required range of ‹k› and σ_{k} values. The network is then created using the stubconnect algorithm (see above).
For the random network, we use the ErdösRényí/Gilbert model^{33,34}. An edge is constructed between each pair of individuals with a probability p, independent of the existence of other edges. The resulting degree distribution is binomial, with mean degree ‹k›=p(N−1).
For the smallworld network, we use the method described by Santos et al.^{22} Each individual is first arranged in a ring, and then connected to its m=‹k›/2 nearest neighbours on either side. Each edge of every node is then rewired with probability p. Rewiring involves disconnecting from the distal node and connecting to another random nonself and nonneighbour node, such that dual edges are avoided and the uniform degree of the network is preserved.
For the scalefree network, we use the BarabásiAlbert model of preferential attachment^{35}. The network starts as a fully connected group of m=‹k›/2 nodes. Each new node is added to the network and connected to m other individuals, each with a probability proportional to the individuals current degree. This creates a network with a degree distribution following a power law, p(k)∝k^{−v}, with the exponent v=3 and average degree ‹k›.
Empirical networks
For the FHS—social contact network, we used a previously described network of social contacts that was collected as a part of the Framingham Heart Study^{36,37}. Individuals participating in the study were connected to family members, coworkers and selfreported friends who were also enrolled in the study. This network was available for seven examinations between 1971 and 2000, and we chose the earliest time point, when the network was the largest. This network represented 5,253 individuals who were connected to at least one other individual. The average degree was ‹k›=6.5, the s.d. in degree was σ_{k}=6.8, and the clustering coefficient was φ=0.68.
For the school contact network, we worked with a publicly available contact network observed among students and teachers at an elementary school over a single school day^{38}. Participants wore electronic sensors that detected close physical proximity and recorded the times over which these contacts occurred. This network is therefore either dynamic (if we consider an edge existing between individuals at time t provided they are in close proximity at that point) or weighted (if we sum up the total time two individuals spent within close proximity over the whole day). To simplify analysis and facilitate comparison with other example networks, we sampled a static, unweighted subnetwork from the full network by connecting every individual with a probability proportional to the total contact time. From this sampled network, we chose the giant component to ensure the population was a single connected graph. With this method the average degree is a free parameter determined by scaling the probability of each edge, and we chose it to agree with the FHS social network. The result was a network with 740 individuals, with an average degree ‹k›=6.5, s.d. in degree σ_{k}=3.3, and clustering coefficient φ=0.04.
For the hospital contact network, we used data collected in a hospital setting to create a contact network of healthcare workers and patients, which is freely available online at http://www.sociopatterns.org^{39}. Similarly to the school network, participants wore electronic sensors and incidences of close physical proximity were recorded over 5 days, resulting in a dynamic/weighted network. We again sampled this network to generate a static, unweighted network with a single giant component. The result was a network with 68 individuals, with an average degree ‹k›=6.5, s.d. in degree σ_{k}=5.3, and clustering coefficient φ=0.29.
For the NATSAL—sexual contact network, we used the results from the United Kingdom National Survey of Sexual Attitudes and Lifestyles (NATSAL) that is freely available online at http://www.natsal.ac.uk/ and has been published previously^{40}. This survey collected the number of sexual partners over the last 5 years for a population of around 30,000 individuals (combining the NATSAL1 and NATSAL2 studies in 1990 and 2000). This degree distribution fits very well to a power law function^{41}, , with exponent ν=2.5, k_{min}=1, and . We then generated a degree sequence for N=10,000 nodes and maximum degree that follows a power law with exponent ν, created a network from this degree sequence using the stubconnect algorithm described above for random networks. We extracted the giant component, resulting in a random network with an average 7,578 individuals, mean degree ‹k›=2.7, mean s.d. in degree σ_{k}=4.9 and clustering coefficient φ=0.002.
Branching process calculations for disease emergence in networks
In the early stages of infection, when the number of infected individuals is very low, the SIS model in a heterogeneous host population can be approximated by a multitype branching process. The particular stochastic process we choose to describe the epidemic is related to the ‘continuous offspring production’ model discussed in the viral dynamics literature^{42} and has been previously used to study disease emergence^{43}. Each individual of type i has a constant rate, r_{ij}, of producing infected individuals of type j and also a constant rate of recovery, γ, akin to death with immediate replacement in other models. Other stochastic processes used to describe the initial phase of epidemics include ‘burst models’ where offspring distributions are specified a priori (such as Kronecker delta^{42} or Poisson^{44,45}), percolation models^{46}, or independent infection probabilities^{47}. The continuous offspring production model considered here mimics what occurs in most simulation algorithms, including ours, and may more closely represent biological reality. The offspring distribution is calculated a posteriori to be multinomial.
An important quantity to calculate is the probability generating function (PGF), F_{i}(s), for the number of secondary infections of individuals of each type, s=(s_{i}, s_{2}, …, s_{n}), caused by a singleinfected individual of type i. For this process, we derive
where j and (j_{1},j_{2},…,j_{n}) are indices, n is the total number of different types of individuals, and is the sum of all rates.
Using the definition of the basic reproductive ratio, R_{0}, as the average number of secondary infections produced by a singleinfected individual, we can define multitype reproductive ratios^{44,45},
This allows us to use to write the PGF as
We now consider a networkstructured population, where individuals are classified according to their degree k. Individuals of type i are those who are connected to exactly k_{i} other individuals. The frequency of individuals of degree i is given by p(i). Following Yates et al.^{45}, we can break down in terms of the disease factors and structural factors,
where β is the percontact transmissibility of the disease, π_{ij} is the average number of type j contacts a type i individual has (the mixing matrix), and ν_{j} is the susceptibility of type j individual (1=fully susceptible, 0=fully resistant to infection). If we assume the network is constructed by the configuration model, that is, edges are joined randomly and there is no correlation between the degree of individuals on either side of an edge, then,
As we are interested in a fixed network structure, we immediately encounter a problem that does not occur when considering heterogeneous yet mixing populations. For all individuals other than the very first infected, the actual number of susceptible contacts will be one less than that given by π_{ij}, because the contact from whom the infection originated cannot be reinfected. For these secondary infections, we must consider the modified mixing matrix,
based on the concept of ‘excess degree’^{48}, and hence a modified reproductive ratio .
We want to calculate the probability that a disease introduced into a population causes an epidemic, as opposed to going extinct. In a random mixing population, standard branching process theory gives the ultimate extinction probability, for an infection originating in a type i individual, as the solution to x_{i}=F_{i}(x). Taking into account the difference between those infected in first and later generation, we must first find the extinction probability for all those infected in later generation and then calculate the ultimate extinction probability starting from a single infection^{47,49},
The emergence probability is then given by
For a homogeneous, wellmixed population, this calculation reduces to
For a homogeneous population with a fixed network structure with degree k we only have a single type k, such that π_{ij}≡π=k and , and the calculation reduces to
We can see from the first expression for P_{emerge} above that, for a homogeneous fixed network structure, P_{emerge}⩽1−1/R_{0} and, that P_{emerge}=0 when .
There are certain limitations to this technique for estimating the emergence probability of diseases in networks. First, we assume an infinitely large random network. Second, host heterogeneity is modelled by dividing individuals into groups based on their degree. Hence, higher order structure is ignored, and so networks that contain high levels of assortativity or clustering may not be well represented with this method. The presence of clustering will decrease P_{emerge}, while assortativity could either increase or decrease it. This method also ignores the issue that, in the SIS model (as opposed to the often considered SIR model), a recovered individual could become reinfected during early emergence, increasing P_{emerge}. Hence, the branching process might underestimate the true probability of emergence for the SIS model, even in a wellmixed population.
Pairwise equations for equilibrium disease behaviour in networks
Branching process calculations can tell us about the probability of disease emergence by capturing stochastic effects that are important when disease levels are low, but do not accurately capture the dynamics as prevalence levels become significant. For this task, deterministic models that track both infected and susceptible individuals are appropriate.
We use the method of pairwise equations^{23} to describe disease dynamics in a networkstructured population. We start with the full SIS pairwise equations^{50},
where [A_{k}] describes the number of individuals with degree k that are in state A, [A_{k}B_{m}] describes the number of pairs of individuals where one member of the pair has degree k and is in state A and the other member is in state B with degree m, and [A_{k}B_{m}C_{l}] is analogous but for triples, with B being the middle member. As the total size and structure of the population is constant, we can use [I_{k}]=[k]−[S_{k}], where [k] is the total number of individuals with degree k, [k]=p(k)N. These equations are exact, but, to completely describe the system, equations for higher order groups of individuals are needed, making them intractable. We make the following series of common approximations (detailed in House and Keeling^{21}) to close the equations:
(i) triple closure,
(ii) deconvolution of pairs,
(iii) detailed balance,
(iv) deconvolution of individuals,
where N is the total population size and E is the total number of edges. Whenever a disease state occurs without a subscript, it implies that it includes the sum over all degrees. We then arrive at a simplified set of equations,
[SI] is the number of edges between susceptible and infected individuals. These equations can be used to describe the time course of the infection among individuals of each degree and the equilibrium state.
Deriving a closed and reduced set of pairwise equations requires making approximations about the types of higherorder correlations between connected individuals introduced by the epidemic. Triple closure and deconvolution of pairs and individuals are examples of such approximations. It is difficult to formulate exactly when these assumptions hold a priori, but previous studies have shown that they usually agree very well with simulations^{21}. In contrast, the detailed balance approximation depends only on the network structure and is assured in a configuration model. In networks with other methods of edge creation, such as preferential attachment, this simplification may fail. As stated above, these approximations assume that the network clustering, φ, is zero, although corrections can be made to account for nonzero values^{21}.
Combining techniques to approximate invasion of a second disease
We derive a combined analytic technique to approximate the invasion of a second disease {β_{2},γ_{2}} in a population infected with a resident disease {β_{1},γ_{1}} at endemic equilibrium. We first solve for the steady state of the pairwise equations for {β_{1},γ_{1}} (equation (17)), which can give us both the total fraction infected with the first disease f_{I}=1−∑ [S_{k}]/N and the fraction of degree k remaining susceptible ν_{k}=[S_{k}]/(Np(k)). We then used this ν along with {β_{2},γ_{2}} to determine the effective basic reproductive ratios for the second disease, and , which can then be used in the branching process calculation to determine the emergence probability (equations (4)–(9), , , , , ). Using this procedure, the resulting emergence probability, P_{emerge}, is equivalent to the fixation probability, P_{fix}, for the second disease. However, to account for the fact that, in our simulations, we only allow the second disease to arise in an individual who was already infected with the first disease, we modified to equation (9),
where p_{I}(k)=[I_{k}]/[I]=p(k)(1−ν_{k})/f_{I} is the probability that a randomly chosen infected individual has degree k.
Another method of combining deterministic and stochastic approaches was recently derived independently to study disease adaptation during emergence in wellmixed populations^{51}.
We found that this firstorder approach consistently underestimated the fixation probability of the invading disease, which we hypothesized was due to assuming an individual’s susceptibility, ν_{j}, was independent of the fact that the connected individual who might infect them was also susceptible. To derive a secondorder approximation, we took the number of susceptible neighbours directly from the pairwise equations and used equations (14)–(16), , to arrive at
The terms [SI] and [S_{j}] can be determined numerically from the equilibrium of equation (17), and by definition .
Recall that in the one disease case, the mixing matrix π_{ij} was,
where X_{j} signifies an individual of type j in either state S or I. We can then write the new mixing matrix in terms of π_{ij},
where ν_{j}=[S_{j}]/[X_{j}] is the fraction of susceptible individuals of type j, is the number of edges from a susceptible individual to any individual and is the total number of edges. From the last line we can see that (πν)_{ij} modifies π_{ij}ν_{j} by taking into account the fact that, because of clustering of susceptible individuals, the fraction of a given susceptible individual’s contacts that are still susceptible (top fraction) may be higher than the noclustering expectation, and therefore may more than compensate for the fact that the individuals who remain susceptible have a lower number of contacts on average (bottom fraction).
This value of (πν)_{ij} cannot be used directly. For secondary infections, we must again consider a modified value , where k_{i} is replaced by k_{i}−1 in equation (19) to take into account the fact that the neighbour from whom the infection originated cannot be reinfected. The resulting expression is
We also need to take into account that the individual who is first infected with the second strain was already infected with the first disease, and so for primary infections the mixing term becomes
where is the number of edges from an infected individual to any individual.
Therefore, to produce the analytical approximations for the fixation probability of an invading strain displayed in Fig. 3d, we first derived and from equations (19) and (23), using the solutions to the pairwise equations for the first disease (equation (17) with {β_{1},γ_{1}}). We can then derive and , respectively. These reproductive ratios were then substituted into equations (4) and (8) to obtain x, which was then used in equation (18) to determine the fixation probability of the second disease.
Analytical solution for the uniform network
The quasisteadystate distribution of individuals infected with the first at endemic equilibrium can be estimated using the pairwise equation (17). For a homogeneous population with a fixed random network structure with degree k, [S_{k}]≡[S], and these equations reduce to
The nonzero equilibrium states for this system, [S]^{∞} and [SI]^{∞}, are
To calculate the fixation probability of the second disease, we need to derive (πν)^{0} and (πν)′ for the uniform network. Substituting the equilibrium conditions into equations (23) and (22), we arrive at
Finally, the fixation probability can be found by solving equations (8) and (9) with R_{0}=(β_{2}/γ_{2})(πν)^{0} and ,
Deriving the selection exponent
For nonuniform networks, there is no general analytic expression that allows us to directly quantify the relationship between network heterogeneity and fixation probability for an invading strain. While the method can be implemented numerically, we chose to also use an empirical function to model the trends observed in Fig. 2. We replace r=(β_{2}/γ_{2})/(β_{1}/γ_{1}) in (26) with r^{α}, where we term α the selection exponent, which is predicted to be 1 for homogeneous networks. This gives,
This idea is inspired by work on the simpler Moran process, where an analytic approximation demonstrates that r becomes r^{α} in structured populations^{10}.
We fit data to this function to determine α, using nonlinear least squares from the nls package in R (ref. 52). The function fit well to most networks and confidence intervals on α were too narrow to be visible on the graphs. Supplementary Table 1 reports the αvalues, confidence intervals and sum of the squared error for the fits.
Additional information
How to cite this article: Leventhal, G. E. et al. Evolution and emergence of infectious diseases in theoretical and realworld networks. Nat. Commun. 6:6101 doi: 10.1038/ncomms7101 (2015).
References
Anderson, R. & May, R. Infectious Diseases of Humans: Dynamics and Control Oxford University Press (1991).
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘smallworld’ networks. Nature 393, 440–442 (1998).
May, R. M. & Lloyd, A. L. Infection dynamics on scalefree networks. Phys. Rev. E 64, 066112 (2001).
PastorSatorras, R. & Vespignani, A. Epidemic dynamics and endemic states in complex networks. Phys. Rev. E 63, 066117 (2001a).
Newman, M. E. J. Spread of epidemic disease on networks. Phys. Rev. E 66, 016128 (2002).
LloydSmith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).
Levin, B. R., Lipsitch, M. & Bonhoeffer, S. Population biology, evolution, and infectious disease: convergence and synthesis. Science 283, 806–809 (1999).
Morens, D. M., Folkers, G. K. & Fauci, A. S. The challenge of emerging and reemerging infectious diseases. Nature 430, 242–249 (2004).
Infectious Diseases Society of America (IDSA). Combating antimicrobial resistance: policy recommendations to save lives. Clin. Infect. Dis. 52, S397–S428 (2011).
Lieberman, E., Hauert, C. & Nowak, M. Evolutionary dynamics on graphs. Nature 433, 312–316 (2005).
Newman, M. E. J. Threshold effects for two pathogens spreading on a network. Phys. Rev. Lett. 95, 108701 (2005).
Karrer, B. & Newman, M. E. J. Competing epidemics on complex networks. Phys. Rev. E 84, 036106 (2011).
Bansal, S. & Meyers, L. A. The impact of past epidemics on future disease dynamics. J. Theor. Biol. 309, 176–184 (2012).
Miller, J. C. Cocirculation of infectious diseases on networks. Phys. Rev. E 87, 060801 (2013).
Boots, M. & Sasaki, A. ‘Small worlds’ and the evolution of virulence: infection occurs locally and at a distance. Proc. Biol. Sci. 266, 1933–1938 (1999).
Boots, M., Hudson, P. J. & Sasaki, A. Large shifts in pathogen virulence relate to host population structure. Science 303, 842–844 (2004).
Caraco, T., Glavanakov, S., Li, S., Maniatty, W. & Szymanski, B. K. Spatially structured superinfection and the evolution of disease virulence. Theor. Popul. Biol. 69, 367–384 (2006).
Boots, M. & Mealor, M. Local interactions select for lower pathogen infectivity. Science 315, 1284–1286 (2007).
Diekmann, O. & Heesterbeek, J. A. P. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation John Wiley & Sons (2000).
PastorSatorras, R. & Vespignani, A. Epidemic spreading in scalefree networks. Phys. Rev. Lett. 86, 3200–3203 (2001b).
House, T. & Keeling, M. J. Insights from unifying modern approximations to infections on networks. J. Royal Soc. Interface 8, 67–73 (2011).
Santos, F. C., Rodrigues, J. F. & Pacheco, J. M. Epidemic spreading and cooperation dynamics on homogeneous smallworld networks. Phys. Rev. E 72, 056128 (2005).
Keeling, M. J. The effects of local spatial structure on epidemiological invasions. Proc. R. Soc. B 266, 859–867 (1999).
Bansal, S., Grenfell, B. T. & Meyers, L. A. When individual behaviour matters: homogeneous and network models in epidemiology. J. R. Soc. Interface 4, 879–891 (2007).
Humplik, J., Hill, A. L. & Nowak, M. A. Evolutionary dynamics of infectious diseases in finite populations. J. Theor. Biol. 360C, 149–162 (2014).
Haraguchi, Y. & Sasaki, A. The evolution of parasite virulence and transmission rate in a spatially structured population. J. Theor. Biol. 203, 85–96 (2000).
Lion, S. & Boots, M. Are parasites ‘prudent’ in space? Ecol. Lett. 13, 1245–1255 (2010).
Cross, P. C., LloydSmith, J. O., Johnson, P. L. F. & Getz, W. M. Duelling timescales of host movement and disease recovery determine invasion of disease in structured populations. Ecol. Lett. 8, 587–595 (2005).
Buckee, C. O., Koelle, K., Mustard, M. J. & Gupta, S. The effects of host contact network structure on pathogen diversity and strain structure. Proc. Natl Acad. Sci. USA 101, 10839–10844 (2004).
Buckee, C., Danon, L. & Gupta, S. Host community structure and the maintenance of pathogen diversity. Proc. R. Soc. B 274, 1715–1721 (2007).
Adlam, B. & Nowak, M. A. Universality of fixation probabilities in randomly structured populations. Sci. Rep. 4, 6692 (2014).
Albert, R. & Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002).
Erdös, P. & Rényi, A. On random graphs. Publ. MathDebrecen. 6, 290–297 (1959).
Gilbert, E. N. Random graphs. Ann. Math. Stat. 30, 1141–1144 (1959).
Barabási, A.L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Christakis, N. A. & Fowler, J. H. The spread of obesity in a large social network over 32 years. N. Engl. J. Med. 357, 370 (2007).
Hill, A. L., Rand, D. G., Nowak, M. A. & Christakis, N. A. Infectious disease modeling of social contagion in networks. PLoS. Comput. Biol. 6, e1000968 (2010).
Salathé, M. et al. A highresolution human contact network for infectious disease transmission. Proc. Natl Acad. Sci. USA 107, 22020–22025 (2010).
Vanhems, P. et al. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE 8, e73970 (2013).
Johnson, A. M. et al. Sexual behaviour in Britain: partnerships, practices, and HIV risk behaviours. Lancet 358, 1835–1842 (2001).
Robinson, K., Cohen, T. & Colijn, C. The dynamics of sexual contact networks: effects on disease spread and control. Theor. Popul. Biol. 81, 89–96 (2012).
Pearson, J. E., Krapivsky, P. & Perelson, A. S. Stochastic theory of early viral infection: continuous versus burst production of virions. PLoS. Comput. Biol. 7, e1001058 (2011).
Iwasa, Y., Michor, F. & Nowak, M. A. Evolutionary dynamics of invasion and escape. J. Theor. Biol. 226, 205–214 (2004).
Antia, R., Regoes, R. R., Koella, J. C. & Bergstrom, C. T. The role of evolution in the emergence of infectious diseases. Nature 426, 658–661 (2003).
Yates, A., Antia, R. & Regoes, R. R. How do pathogen evolution and host heterogeneity interact in disease emergence? Proc. R. Soc. B 273, 3075–3083 (2006).
Moore, C. & Newman, M. E. J. Epidemics and percolation in smallworld networks. Phys. Rev. E 61, 5678–5682 (2000).
Alexander, H. K. & Day, T. Risk factors for the evolutionary emergence of pathogens. J. R. Soc. Interface. 7, 1455–1474 (2010).
Newman, M. E. J. Mixing patterns in networks. Phys. Rev. E 67, 026126 (2003).
Brauer, F. An introduction to networks in epidemic modeling. InMathematical Epidemiology eds Brauer F., Van den Driessche P., Wu J. 129–142Springer (2008).
Eames, K. T. D. & Keeling, M. J. Modeling dynamic and network heterogeneities in the spread of sexually transmitted diseases. Proc. Natl Acad. Sci. USA 99, 13330–13335 (2002).
Hartfield, M. & Alizon, S. Epidemiological feedbacks affect evolutionary emergence of pathogens. Am. Nat. 183, E105–E117 (2014).
Core Team., R. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing: Vienna, Austria, (2014) URL http://www.Rproject.org/.
Acknowledgements
We thank Jan Humplik, Marcel Salathé, Nicholas Christakis and Jonas Liechti for helpful discussions. S.B. thanks the Swiss National Science Foundation (133129) and the European Research Council (PBDR268540). M. N. received support from the John Templeton Foundation. M. N. and A. H. received support from the Foundational Questions in Evolutionary Biology Fund.
Author information
Authors and Affiliations
Contributions
G.E.L., A.L.H. and S.B. conceived the project; G.E.L. and A.L.H. performed the simulations and calculations; all authors interpreted the results and produced the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figure 1 and Supplementary Table 1. (PDF 90 kb)
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Leventhal, G., Hill, A., Nowak, M. et al. Evolution and emergence of infectious diseases in theoretical and realworld networks. Nat Commun 6, 6101 (2015). https://doi.org/10.1038/ncomms7101
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/ncomms7101
This article is cited by

Effect of Human Behavior on the Evolution of Viral Strains During an Epidemic
Bulletin of Mathematical Biology (2022)

Evolutionary Computation in Social Propagation over Complex Networks: A Survey
International Journal of Automation and Computing (2021)

Effects of social distancing and isolation on epidemic spreading modeled via dynamical density functional theory
Nature Communications (2020)

STVG: an evolutionary graph framework for analyzing fastevolving networks
Journal of Big Data (2019)

Positively interacting strains that cocirculate within a network structured population induce cycling epidemics of Mycoplasma pneumoniae
Scientific Reports (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.