New approaches to epidemic modeling on networks

In this article, we develop two independent and new approaches to model epidemic spread in a network. Contrary to the most studied models, those developed here allow for contacts with different probabilities of transmitting the disease (transmissibilities). We then examine each of these models using some mean field type approximations. The first model looks at the late-stage effects of an epidemic outbreak and allows for the computation of the probability that a given vertex was infected. This computation is based on a mean field approximation and only depends on the number of contacts and their transmissibilities. This approach shares many similarities with percolation models in networks. The second model we develop is a dynamic model which we analyze using a mean field approximation which highly reduces the dimensionality of the system. In particular, the original system which individually analyses each vertex of the network is reduced to one with as many equations as different transmissibilities. Perhaps the greatest contribution of this article is the observation that, in both these models, the existence and size of an epidemic outbreak are linked to the properties of a matrix which we call the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {R}}$$\end{document}R-matrix. This is a generalization of the basic reproduction number which more precisely characterizes the main routes of infection.


Context.
A very natural way to model the spread of a human-to-human transmissible infectious disease is to encode each individual as the vertex of a graph whose edges model the interactions through which the disease can propagate. See [1][2][3][4][5][6][7][8][9] and references therein for the vast literature of epidemic modeling, including on networks. We also refer the reader to some very interesting related work in [10][11][12][13][14] .
However, despite the large body of work, there are substantial difficulties in implementing such methods, the most obvious of which being the difficulty in inferring a realistic network and in analyzing the very high dimensional resulting system of ordinary differential equations. Furthermore, due to such difficulties most models make the additional simplifying assumption that all interactions have the same probability of transmitting the disease.
In fact, extending the theory in order to incorporate interactions with different probability of transmitting the disease, dealing with heterogeneity, developing approximation schemes, and understanding network based interventions are all listed as some of the main challenges facing network epidemic modeling as stated in 7 .
These challenges are in fact all linked as, for instance, understanding which interactions are the most responsible for the epidemic spread would allow for better insight on which kind of interventions are the most effective in controlling an outbreak. This is therefore a fundamental research direction which ought to be pursued with more intensity in the future.
In this work we shall use a method that at the same time deals with the two difficulties mentioned above while at the same time incorporating the possibility of different types of interactions. Our results are insightful and our techniques tractable enough so that they can be effectively used in the future in a large amount of situations.

Summary of results.
We shall now summarize our approach and main results. Consider a large number of individuals N ∈ N interacting with each other through n ∈ N different types of interactions which have probabilities T 1 , . . . , T n ∈ [0, 1] of transmitting the disease (typically 1 ≤ n ≪ N ). To encode the network we use n different graphs G 1 , . . . , G n whose edges represent the different interactions and G = G 1 ∪ · · · ∪ G n is the total graph. For example, an edge of the graph G i encodes an interaction which has a probability T i of transmitting the disease.
In reality, we shall require relatively little information on the specific properties of the network encoding the interactions. Namely, we will only need to know the degree distributions for the several types of interactions. Said A comment on computational methods. Before embracing in proving the main results we want to make one further comment. There are computational tools which can be used to implement epidemics spreading on networks such as the EoN and SEIRSplus Python packages. While these do exemplify the well known epidemic behaviors that we describe, our main contribution is to rigorously mathematically demonstrate the mentioned results using the very general models we consider (recall that we allow for n ≥ 1 different transmissibilities). This goes beyond the current state of the art as these results had only been rigorously established in the case n = 1.

Generating functions
This section reviews some basics of generating functions, including multivariate generating functions. For some fascinating early and varied applications of the method of generating functions we recommend 16 . Graphs and their generating functions. Let G be a graph whose vertices v(G) encode individuals and whose edges e(G) encode interactions through which the disease can spread. We consider a family of n ∈ N subgraphs G 1 , . . . , G n ⊂ G with v(G i ) = v(G) , e(G i ) ⊂ e(G) for i = 1, . . . , n and e(G 1 ) ∪ · · · ∪ e(G n ) = e(G) . For each i = 1, . . . , n , let {P i (k)} k∈N 0 be the degree distribution of the graph G i meaning that a randomly chosen vertex has degree k with probability P i (k) . The corresponding generating functions are given by www.nature.com/scientificreports/ and we define the joint multivariate generating function by Remark 1 Notice that G ′ i (x) = +∞ k=1 kP i (k)x k−1 and so the average degree of G i , denoted E(k i ) , may be computed to be Similarly, using the fact that G i (1) = 1 , we find the total average degree to be Remark 2 The idea of having the n subgraphs G 1 , . . . , G n is that of modeling different types of contacts between individuals which have unequal probability of transmitting the disease. Thus, we associate a probability of transmission T i to each i = 1, . . . , n and assume with no loss of generality that T 1 > · · · > T n .
Excess degree distributions. Consider a randomly chosen individual which may have been infected following a randomly chosen transmission. Considering it as a vertex in G i , we define its excess degree (or ramification) as the number of extra edges emanating from it. The joint probability that such a vertex has ramification (k 1 , . . . , k n ) along the graphs G 1 , . . . , G n can be computed directly from the degree distributions P i (·) of the graphs G i as in 17 . Indeed, given that fixing the subgraph G i there are k i + 1 ways of arriving at a vertex with degree k i + 1 (ramification k i ) we find that This will be referred to as the excess degree, or ramification, joint distribution.

Remark 3
Here we are working with the excess degree distribution for a randomly chosen individual rather than than the excess degree distribution for a randomly chosen infected individual.
The associated multivariate generating function which as shown in 17 can be written in terms of the G i (x i ) for i = 1, . . . , n as follows and recall that E(k) = n l=1 G ′ l (1).

Remark 4
Let m 1 , . . . , m n ∈ N 0 with m 1 + · · · + m n > 0 and consider a randomly chosen individual v and assume that the remaining population is all susceptible. Using the distribution for the excess ramification, the probability that, if infected, v will infect m i individuals along G i for each i Then, the generating function for the random variable M given by the number of infections caused by a randomly chosen individual, if infected, is (1) G(x 1 , . . . , x n ) := k 1 ,...,k nP (k 1 , . . . , k n )x k 1 1 . . . x k n n ,  www.nature.com/scientificreports/ Remark 5 Recall that, for a disease starting to propagate in an otherwise completely susceptible population, the basic reproduction numberR 0 is given by R 0 = G ′ M (1) , which from the above formula can be computed to be Specific excess degree distributions. We now consider the excess degree distribution by following an edge of a specific graph G i , we define its available ramification as the number of excess edges emanating from it. Then, as before, the joint probability that such a vertex has excess degree (k 1 , . . . , k n ) is Therefore, its multivariate generating function G i (x 1 , . . . , x n ) = k 1 ,...,k nP i (k 1 , . . . , k n )x k 1 1 . . . x k n n can be computed to be . . , k n ) . Using either this fact or the previous formulas for the generating functions we find that G can be computed from the G i as follows Definition 1 Let i = 1, . . . , n and G i (x 1 , . . . , x x ) be as in (5). Define the RG-matrix as the n × n matrix whose (i, j) entries are Furthermore, given s (k 1 ,...k n ) ∈ [0, 1] for all (k 1 , . . . k n ) ∈ N n 0 and i = 1, . . . , n we shall define the generating function It will prove useful to also define a R-matrix associated with these generating functions as follows. . , x x ) be as in (6). Then, we define the n × n matrix whose (i, j) entries are which we shall call the RH-matrix.

Remark 7
In the case when s (k 1 ,...,k n ) = 1 we have H i = x iGi and so the RH-matrix turns into Remark 8 We can define a quantity R 0 (i) which yields the average number of infections caused by the individuals which were themselves infected from an interaction of G i . In formulas, such a quantity is given by or, in terms of the first reproduction matrix, as the sum of all the entries in the i-th line, i.e. R 0 (i) = n j=1 RG ij . It is easy to check that .

Prevalence of infection and percolation (the late stages of an outbreak)
Recall that for i = 1, . . . , n the transmissibilities T i denote the probability that an interaction, encoded by an edge of G i , will transmit the disease if one of its ends is infected. We point out that this approach is in different from that of multiplex networks as in that case it is the different nodes that have different transmissibilities, see for example [18][19][20] . In our case each node can have several T i and so our approach can be interpreted as a generalization of the multiplex networks case. We also point out that we shall work in full generality and our results are widely applicable. At this point we follow a well known trick first introduced in 1 and 21 in the case when there is only one transmissibility. In order to implement this trick in the case when there is more than one transmissibility we introduce n quantities, called q 1 , . . . , q n : Each q i corresponds to the average probability that a vertex is not infected through a specific interaction (edge) of G i . For this to happen, either: • the infection is not transmitted (independently of whether the individual in the other end of this interaction is infected or not) which has probability 1 − T i , or • the infection would be transmitted by the interaction, with probability T i , but the other individual was not infected, which happens with probability n j=1 q k j j if it has excess degree (k 1 , . . . , k n ).
Hence, on average we have given by the right hand side of Eq. (8).
Then, the average probability that a randomly chosen vertex does not get infected is given by

Remark 9
Notice that the probability a vertex with degree (k 1 , . . . , k n ) does not get infected is given by and so P is simply the average of these probabilities. So, we see that this mean field type approximation also implies the weaker approximation where the probability that a vertex gets or not infected only depends on its degree.
We turn now to the question of finding conditions which guarantee that (8) has a solution other than the obvious one at (1, . . . , 1) , which corresponds to the absence of disease. Before we embrace in the general analysis we consider the simple special case when n = 1 which already appears in the literature, for example chapter 16 in 2 .

Example 1
In the case when n = 1 the fixed point equation (8) reads Denoting the right hand side by F(q), we have F(1) = 1 and Such a condition is given by TG ′ (q) > 1 which can equally be written as R 0 > 1.
In this simple setting when n = 1 , we can further try to better understand the transition phenomena at T = T c such that T cG ′ (1) = 1 . For this we expand G around q c = 1 as a Taylor series and inserting into Eq. (10) we have where we have used G ′ (1) = T −1 c . This can be rewritten as www.nature.com/scientificreports/ from which we find Then, expanding P in a Taylor series around q c = 1 we find valid for T ≥ T c and which describes the phase transition as a power law with exponent 1.
Continuing to explore the case when n = 1 we shall now give two very simple examples which can be solved explicitly.
Example 2 (2 neighbors and n = 1 ) In this situation each individual contacts with only two other ones, we have G(x) = x 2 and G (x) = x . Then, the fixed point equation is q = 1 − T + Tq which has the unique solution q = 1 independently of T ∈ (0, 1) . The only other solution is q = 0 which occurs in the case when T = 1.
This is to be expected as if there is a probability that the interactions will not transmit the disease, then almost surely there will be someone which does not transmit it and so it does not get passed that individual. In a large population, almost everyone will be left uninfected. Hence, we see an interesting explicit phase transition occurring at T = T c = 1/2 . In terms of the probability P that a randomly chosen individual escapes infection we find that Notice that this is compatible with Eq. (11). Indeed, expanding P near T c = 1/2 we find that When n ≥ 1 we can equally prove existence of a critical point in (0, 1) n if certain n quantities are greater than one (in the n = 1 case there is a single quantity which can be readily identified with R 0 > 1 ). However, in this more general case the proof is slightly less elementary as this is a codimension n > 1 problem for which the intermediate value theorem no longer applies.
On possible approach would be to denote by F : [0, 1] n → [0, 1] n the function defined by the right hand side of (8) and find hypothesis so that there is Then, the Brower fixed point theorem would guarantee the existence of such a fixed point in (0, 1) n . However, we shall instead proceed in a slightly different manner.
Proof We shall look for solutions of (8) in (0, 1) n and it will prove convenient to write these as q i = 1 − T i ε i . Then, the fixed point equation (8) turns into the equations for ε = (ε 1 , . . . , ε n ) . Then, we look for fixed points of the function F given by the right hand side of equation above with ε = 0 . By Brower's fixed point theorem, for such a fixed point to exist, it is enough if (11) Inspired by this proof and the computation in example 1, also for n ≥ 1 we shall search for a phase transition (or bifurcation) from q c = (1, . . . , 1) due to a variation in the parameters T = (T 1 , . . . , T n ) . In order to set up the nomenclature, we shall consider a 1-parameter family t ∈ I ⊂ R � → T(t) of transmissibilities. For all t ∈ I we have that q c = (1, . . . , 1) is a solution to (8). We shall say that a bifurcation from q c occurs at T(0) if any neighborhood of (q c , T(0)) in R n × I contains solutions of (8) not equal to q c . This will be called a phase transition if such solutions lie in a continuous curve parameterized by t. The following result gives a necessary condition for the existence of a phase transition.
Proof Bifurcations of q from q c at T(0) are in one to one correspondence with bifurcations of ε from ε c = 0 at T(0). Defining the function G = (G 1 , . . . , G n ) whose entries are for i = 1, . . . , n . By the implicit function theorem, if for a given T we had dG 0 : R n → R n being an isomorphism, then no bifurcation could occur. This already gives us a necessary condition for a bifurcation to occur.
The (i, j) entry of dG 0 is given by and so dG 0 is non-invertible if and only if the RG-matrix has 1 as an eigenvalue. We have thus concluded that for a bifurcation to occur at a given T, the RG-matrix must have a unit eigenvalue.

Remark 10
We also mention in passing that in biology, such phase transitions and bifurcation phenomena are sometimes referred to as branching processes. We direct the reader to the Refs. 22,23 for more on such branching processes in biology.
Suppose now that the RG-matrix associated with T(t), which we shall denote by RG(t) has exactly one eigenvalue (t) such that (0) = 1 (this means that the algebraic multiplicity of (t) is one). If we further assume that ˙ (0) � = 0 , i.e (t) crosses 1 transversely at t = 0 . Then, the computation (12) shows that at t = 0 the map dG 0 has a 1-dimensional kernel and cokernel and that (∂ t | t=0 dG 0 )(ker(dG 0 )) / ∈ im(dG 0 ) . Then, the Crandall-Rabinowitz theorem (Theorem 1.7. in 24 ) shows that a phase transition must occur at t = 0 . We shall state this separately as follows.
and similarly for G 2 . Hence, the R 1 -matrix can be written as www.nature.com/scientificreports/ Example 5 (n = 2 with 2 neighbors each) Consider T 1 < T 2 and G 1 (x) = x 2 , G 2 (x) = x 2 . Then, we have G i (x 1 , x 2 ) = x i x 2 j for i = 1, 2 and j = i . Then, the RG-matrix is whose eigenvalues are Clearly, − < 0 and so a phase transition must occurs when + crosses 1, i.e. when Furthermore, in this case the equations for (q 1 , q 2 ) read In particular, using the first equation to write q 1 in terms of q 2 and inserting in the second we find that solutions are given by q 2 = 1 and solutions of the quartic equation Example 6 (n = 2 with 2 exponentially distributed graphs) Again, we consider In this situation we have Then, the R 1 -matrix is whose eigenvalues are 0 and which in this case coincides with R 0 and we therefore find that a phase transition occurs when R 0 crosses 1. In fact, it is tempting to regard T 1 N 1 and T 2 N 2 as the respective contributions to R 0 by the networks G 1 and G 2 . Indeed, R 0 (1) = 2T 1 N 1 and R 0 (2) = 2T 2 N 2 so that as shown in Remark 8. These interpretations of R 0 (1) and R 0 (2) may, however, not be appropriate to interpret some non-intuitive phenomena. For example, one may be lead to think that both q 1 and q 2 are non-increasing with respect to R 0 (1) and R 0 (2) . However, this need not be true as we shall illustrate in an example. In this case the equations for (q 1 , q 2 ) read and we shall now iterate the right hand side to approximate two solutions. Say, in the case when N 1 = 50, T 1 = 0.1 and N 2 = 2, T 2 = 0.6 , which corresponds to R 0 (1) = 5 and R 0 (2) = 1.2 we have (q 1 , q 2 ) ∼ (0.9, 0.4) . On the other hand, if instead N 1 = 5, T 1 = 0.2 , which corresponds to R 0 (1) = 1 , while the rest remains as before, we have (q 1 , q 2 ) ∼ (0.83, 0.49) and so q 1 did decrease but q 2 increased.
However, this situation is not totally counter-intuitive as overall, the value of P , the probability that a random vertex escapes infection, does increase in the second example where P ∼ 0.17 in comparison with a P ∼ 0.13 in the first example. www.nature.com/scientificreports/

Dynamic modeling in a local mean field approximation
In section "Prevalence of infection and percolation (the late stages of an outbreak)" we studied a mathematical framework, related to percolation models, and used the properties of the network in order to compute the probability that a given node will eventually be infected. However, this framework does not look at the specific way the infection propagates in the network through time. That will be the topic of the current section. Here, we investigate an extended SIR type system modeling the spread of an epidemic on a network with different types of contacts, meaning that the transmissibilities are not all the same and can be gauged to approximate different kinds of contacts. As before, we consider a set of n graphs G 1 , . . . , G n with the same vertices but different edges. These encode the interactions and each graph G i is weighted by a transmissibility per unit time β i encoding the probability of transmitting the disease through that interaction (per time unit). Our model is a simple alternative to 25 which more directly deals with contact duration.
The model. For each vertex v of G we shall denote by G i (v) all its neighbors through the graph G i , in other is the set of all vertices which are connected to v through a an edge of G i . Then, we respectively denote by s v , x v and r v the probabilities that v is either susceptible to the disease, infected, or removed. The dynamics of this network SIR model is then approximately governed by where γ > 0 is the rate of recovery.

Remark 11
Alternatively, we can let A i vw be the entries of the adjunction matrix of the graph G i and write the sum w∈G(v i ) s v x w as w A i vw s v x w .

Remark 12
The system (13) is an approximation because the average probability that v is susceptible and w infected is only approximately given by s v x w . In order to work with a non-approximate model one would have to write an infinite array of equations modeling the dynamics of the average probabilities of all such nonlinear quantities. See also 3 for such an analysis carried out in the situation where there is only one type of interaction.
Classifying the disease free equilibrium. It is clear from the equation ṙ v = γ x v that any equilibrium solution of the system (13) we must have x v = 0 for all v. Hence, any equilibrium solution is disease free. In fact, setting x v = 0 and s v + r v = 1 gives a 1-parameter family of equilibrium solutions of the system and any equilibrium solution must be one of these. An important question is then to understand if a non-constant solution converges to one of these equilibria and to which? The fact that any solution {(s v (t), x v (t), r v (t))| t ∈ [0, +∞)} v converges to an equilibrium is immediate from the fact that r v (t) is nondecreasing and bounded, hence the limit exists. Furthermore lim t→+∞ṙv = 0 which implies x v (∞) = 0 as we wanted to show. The main question then becomes: Question 1 What is the disease free equilibrium (s v (∞), 0, r v (∞)) to which a solution starting at (s v (0), x v (0), r v (0)) converges? Can we understand how this depends on the properties of G 1 , . . . , G n and β 1 , . . . , β n ?
From the system of equations (13) we find that for each vertex v of G , there are two conserved quantities and Let N be the total number of vertices. These conserved quantities reduces the system of 3N equations for 3N functions to a system involving only N functions. In fact introducing s v = 1 − r v − x v and the last equation of system (13) into the previous conserved quantity we find that Suppose v starts susceptible, then ṙ v (0) = 0 = r v (0) and so H(v) = 1 , from which we have (13) s v x ẇ www.nature.com/scientificreports/ for all time. Given that the solution converges to an equilibrium, we must have lim t→+∞ṙv = 0 and so where in this equation we have written r v (∞) as r v to simplify notation. It will prove convenient to have the following notion at hand.

Definition 3
Let v be a vertex of G and k i (v) its degree as a vertex of G i we shall denote by the expected number of infections v will cause if infected.

Remark 13
Notice that the quantity β i /γ represents the probability that an interaction of G i , between an infected and a susceptible individual, results in an infection. Hence, the quantity n i=1 can be regarded as the average number of infections the individual represented by v is expected to cause if it is infected.

Remark 14
For example, let us assume we have a sufficiently simple situation so that r w ≈ r v for all w ∈ G i (v) . Then, inserting this into (14) we find that In this situation, the right hand side equals 1 when r v = 0 and vanishes when r v = 1 . Hence, by the mean value theorem, a solution with r v ∈ (0, 1) exists if and only if the derivative of the right hand at r v = 0 , is positive. Such a derivative can be computed to be R(v) − 1 which is positive if and only if R(v) > 1.
In order to investigate the existence of solutions to Eq. (14) with r v = 0 it is convenient to rewrite this equation as Hence, our problem is now recast as the problem of looking for fixed points of the function F : [0, 1] N → [0, 1] N given by F(r 1 , . . . , r N ) = (F 1 (r 1 , . . . , r N ), . . . , F N (r 1 , . . . , r N )) where The first obvious fixed point is that occurring at the origin which corresponds to all individuals still being susceptible, i.e. the disease never spread. In the next result we give a criteria for the existence of another fixed point.

Theorem 1 Suppose that for all vertices v of
Then, the solution to the system (13) starting with r v (0) = 0 converges to a disease free equilibrium with r v = 0 for all v. This satisfies Eq. (15) and the bound

In particular, we find that the upper bound is increasing with R(v) (in agreement with basic intuition).
Proof The proof that the solution converges to a disease free equilibrium is given in the beginning of this subsection. This must satisfy (15) and we shall now prove that, under the conditions stated, r v ∈ (0, 1) , i.e. r v = 0 . We proceed as in the proof of Proposition 1 by applying Bower's fixed point theorem. Let ε > 0 to be fixed later, then the Taylor expansion of F v around the origin reads www.nature.com/scientificreports/ Furthermore, F v is non-decreasing with respect to each entry. Hence, if R(v) > 1 for all v we find that for sufficiently small ε > 0 the function F| [ε,1] N maps [ε, 1] N to itself and by the Brower fixed point theorem must have a fixed point in [ε, 1] N . We turn now to the proof of the upper bound for r v given in the statement. Recall that k i (v) denotes the degree of v in G i . From Eq. (14) we immediately find that the argument of the exponential in the right hand side satisfies Hence, we discover that r v = r v (∞) satisfies Inserting the Definition 3, of R(v), in this bound gives r v ≤ 1 − exp (−R(v)) , as claimed in the statement.

Remark 15
Inserting the equation ṙ w = γ x w for w ∈ G i (v) , in the first equation of (13) we find that this can be integrated using the initial condition r w (0) = 0 to obtain This is a very nice and beautiful formula, but it is not of much use if we know nothing about r w (t) . However, we do can take the limit as t → +∞ and assume we are converging to a disease free equilibrium (s v , x v , r v ) = (s v , 0, r v ) whose existence is assured, for example, under the assumptions of Proposition 1. In such a situation we have lim t→+∞ s v (t) = 1 − lim t→+∞ r v (t) . Hence, in this limit the previous equation turns into By setting s v (0) = 1 this is the same equation which we have previously derived. This situation is slightly more general than the one we have analyzed in Theorem 1 and a similar results hold yielding the bound

Dynamic modeling in a mean field approximation by degree similarity
The system of the previous section, though very general is extremely large and difficult to investigate. For this reason, it is convenient to find simpler systems which we can more easily analyze. This is the content of this section where we will consider a system for the average probability that vertices of a given degree are in specific states. This is an oversimplified assumption which nevertheless allows one to gain a lot of insight on the dynamics of an epidemic in a network. The approach we take here is mostly inspired from that of 2 , where the authors first learned it for the case n = 1 . There are also interesting individual based approaches which however require knowing the whole network structure and they also only have n = 1 , see for example 26 . See also 27 for some computational results using a model on a weighted network.
The model. For each k = (k 1 , . . . , k n ) we shall denote by s k , x k and r k the average probabilities that a vertex with degree k is susceptible, infected and removed respectively. Such a network SIR model is governed by the following system www.nature.com/scientificreports/ is the averaged probability of an individual being infected after following a random edge of the graph G i .

Remark 16
Again, in writing the system (17) we have used one large simplification. Namely, we have approximated the average value of x k ′ s k by the product of the average values of x k ′ and s k .
An equivalent (much reduced) system of equations. Notice that (17) can potentially be an extremely large system, as there are as many groups of 3-equations as degree combinations. It is remarkable that this system can be extremely reduced to one that only involves n equations for n functions. For this, it is convenient to introduce quantities w i measuring the average number of removed individuals following an edge of G i . These are given by for i = 1, . . . , n . Then, we have ẇ i = γ v i which inserting into the equation for ṡ k yields ṡ k = − 1 γ n i=1 β i k iẇi s k . Assuming that the epidemic outbreak starts with no-one removed from previous infections, we have w i (0) = 0 (which follows from r k (0) = 0 for all k ∈ N n ) and so which we can also rewrite as for u i (t) = exp − β i γ w i . Now, recall that s k + x k + r k = 1 and so where in the last equality we have used the generating function (6) whose definition we recall to be which depends on the initial conditions s (k 1 ,...,k n ) (0).

Remark 17
Consider the case where one is willing to make the simplifying assumption that s k (0) = s(0) for all k ∈ N n , i.e. the initial proportion of susceptible individuals is independent of their degree distribution (this may be a reasonable assumption for a new disease which begins to spread in an unknown part of the population). Then, H i (x 1 , . . . , x n ) = s(0)G i (x 1 , . . . , x n ) for all i = 1, . . . , n . Notice a few properties of the function H , namely it is nondecreasing with respect to each coordinate, and i.e. the average number of initially susceptible individuals in G i , meaning those which have edges in G i .
Using v i = 1 γẇ i and w i = − γ β i log u i and rearranging, we find that (17) can be written as a system for w = (w 1 , . . . , w n ) given by and further using and w i = − γ β i log u i this can be rewritten as a system for u = (u 1 , . . . , u n ) which is a substantially smaller than the initial one in (17).
Existence of an equilibrium. At an equilibrium point we either have the rights hand side of (19) vanishing, i.e.   www.nature.com/scientificreports/ or, written as a fixed point equation, as for i = 1, . . . , n . In particular, notice from Eq. (18) that this implies that v i = 0 for all i = 1, . . . , n and so this equation encodes a disease free equilibrium to which the solution of the system is expected to converge. As in section "Prevalence of infection and percolation (the late stages of an outbreak)", we shall start by analyzing the case when n = 1 in the following example. It will serve as a good exercise for the n > 1 case. See also 2 for this analysis in the case n = 1.
Example 7 (n = 1 ) In this situation, there is only one H i (x) which is given by and the Eq. (21) for an equilibrium is Hence, an equilibrium is determined by fixed points of the function g(w) given by the right hand side of (22). Notice that g(0) = 1 − E[s(0)] > 0 (even though it may be very small) while g(1) = 1 −H(e − β γ ) < 1 , and so a fixed point w eq always exist by the intermediate value theorem. Moreover, we find from that g(w) is increasing while from we find that its concavity always faces down. Hence, the equilibrium point w eq must be unique. We further deduce the equilibrium value of s k to be However, when E[s(0)] = 1 , then there is fixed point of g at w = 0 which corresponds to never having a disease spreading. For another fixed point w eq ∈ (0, 1) to exist we must have g ′ (0) > 1 which can be written as and can be identified with the initial basic reproduction number. This situation with E[s(0)] = 1 is relevant, for example, when considering an approximately infinite number of individuals with only a finite number of infected individuals. Also in this case, the insights given by the previous example can be extended to higher dimensions to prove the existence of an equilibrium point of the system starting at a given configuration of initially susceptible individuals.  www.nature.com/scientificreports/ and again at least one such equilibrium w eq = (w eq 1 , . . . , w eq n ) exists from Brower's fixed point theorem. Using this we find which again we can see to be exponentially decreasing with k i and T i (notice however that the w eq i themselves are also functions of the β i /γ so this statement is somewhat imprecise).
Given that at the equilibrium point all x k = 0 (recall that γ v i =ẇ i ) we have s eq k + r eq k = 1 from which the last equality and inequality in the statement follow.

Remark 18
Of course, when E[s(0)] = 1 there is an equilibrium point with w eq = 0 and s eq k = 1.
Convergence to the equilibrium. We consider the cases n = 1 and n = 2 for which we have the flow equation (20) which we rewrite as u i = f i (u 1 , . . . , u n ) , for In the case n = 1 we have In particular, lim u→0 + f (u) = 0 with lim u→0 + f ′ (u) = +∞ and so there is ε > 0 such that f (u) > 0 in (0, ε) . On the other hand while lim u→1 − f (u) = −β(1 − s(0)) < 0 . Hence, the solution to (20) with n = 1 stays bounded inside (0, 1) and therefore converges to an equilibrium point u eq ∈ (0, 1) with f (u eq ) = 0 . Hence, We now turn to the case when n = 2 . In this situation we have lim u i →0 + f i (u 1 , u 2 ) = 0 and and similarly for ∂f 2 ∂u 2 . In particular, we find that lim u i →0 + f i (u 1 , u 2 ) = +∞ . Hence, we have that there is ε > 0 such that f i (u 1 , u 2 ) > 0 for u i ∈ (0, ε) . Furthermore, Hence, the solutions stay inside the square [0, 1] 2 and by the Poincaré-Bendixon theorem, it must therefore converge to an equilibrium point (u eq 1 , u eq 2 ) with f i (u eq 1 , u eq 2 ) = 0 for i = 1, 2 . Notice that here we have implicitly used the fact that there are no non-constant periodic solutions. That can easily be proven by looking at the system (17) from which we find that the r k must be non-decreasing and can only be constant at a disease free equilibrium.

Short time behavior.
Recall from Eq. (19) that the system is governed by the set of ordinary differential equations for i = 1, . . . , n . Given that initially we have w i (0) = 0 we may Taylor expand right hand side above using   www.nature.com/scientificreports/ where R H ij is the (i, j) entry of the RH-matrix associated with the transmissibilities T k = β k /γ as in definition 2, and the . . . denote terms of order O(|w| 2 ) . In this way, the above equation may be written as Suppose that RH can be diagonalised and let with eigenbasis v 1 , . . . , v n and 1 ≤ · · · ≤ n the corresponding eigenvalues. Then, we write w(t) = n l=1 w l (t)v l we find that each w l must solve which we can integrate to obtain We therefore find that the initial exponential growth observed at the beginning of an outbreak is codified in the existence of an eigenvector of RH greater than 1, as alluded to in the introduction.

Major limitations
As with any model, those considered in this article have a scope and are therefore heavily limited. Obviously, there are limitations associated with the several assumptions and approximations made, but there are also several others. For instance, the fact that one does not need to consider the full form of the network, which can be interpreted as a strength of the model, may also be a severe limitation. Indeed, there are many different multigraphs G = G 1 ∪ · · · ∪ G n having the same degree distributions and the spread of an epidemic outbreak may have different features in such in-equivalent networks. This is not incorporated by our models which simply use the information on the degree distributions. However, as previously mentioned, this is limitation can also be seen positively. Indeed, while the exact form of the network is practically impossible to obtain in practice, estimating the degree distributions is a feasible endeavor. Nevertheless, one must bear such limitations into consideration anytime these models are used.