Introduction

Public health response to infectious disease epidemics faces two major challenges. First, to predict if an emerging pathogen will cause a large-scale epidemic. Second, to control epidemics and prevent recurring outbreaks of known pathogens. SARS-CoV-2 is evidence of both: a new pathogen which caught the world by surprise, wreaked havoc, and tested our control capabilities with recurrent waves1. Global crises like the COVID-19 pandemic may become more frequent, as climate change increases the threat of new viral species jumping to humans from animal reservoirs2. Among them, avian strains of influenza A (e.g. H5N1, H7N9) are closely monitored for their ability to develop into major pandemic strains3. With a universal vaccine still unavailable, preparing for the next influenza pandemic requires large-scale access to antiviral treatment4 and optimal drugs administration5,6.

The fight against epidemics of known pathogens is equally hard, and exceptionally relevant in the context of sexually transmitted infections (STI). The progress towards elimination of HIV/AIDS is faltering7. Chlamydia, gonorrhea and syphilis cause more than 200 million new cases worldwide among adults each year8, and widespread antimicrobial resistance is compromising our ability to fight them9. Yet elimination remains a priority, to reduce not only their burden, but also their effect on HIV acquisition and transmission8. Finally, mpox virus has spread to non-endemic areas, overwhelmingly through sexual transmission and in communities already burdened by other STIs10.

The two epidemic contexts—pandemics and STI—have little in common. But the goals of preventing large-scale emerging epidemics and eliminating endemic diseases find a common theoretical framework and application tool in the epidemic threshold11,12. The epidemic threshold defines the critical value of disease transmissibility above which the infection can establish itself in a host population. Interventions that raise the epidemic threshold make the population more resilient to the pathogen. Opposite changes increase the epidemic risk.

The epidemic threshold has historically contributed to inform core public health activities, for its potential to provide context awareness and gauge the effort needed for prevention or eradication13,14,15. In the current COVID-19 pandemic, scientists, public health officials, authorities, and even the general public have continuously evaluated control policies by their ability to bring epidemic waves below the epidemic threshold, i.e., reproductive number Rt < 116,17,18,19.

Current theories, however, cannot handle the complexity of infectious disease dynamics in real populations, and resort to two oversimplified schemes. One consists in allowing for detailed disease progression but limited population structure, and stemmed from mathematical epidemiology20,21,22. This clashes, however, with the evidence of highly complex time-varying contact patterns measured or estimated in different contexts15,23,24,25. To overcome this, the physics and network science community has pushed the development towards increasing realism in population structure12,26,27,28,29,30,31,32. This has happened, however, at the expense of disease description33,34,35,36,37, which has opened new problems: Simplifying assumptions on disease natural history bias results38, and variations in intervention protocols cause radically different epidemic outcomes6. These simplifications limit our knowledge of disease dynamics, and cripple the ability of models to serve public health.

The theoretical formalism presented here will eschew both. It will give estimates of the epidemic threshold that are both robust, to provide rapid and generalizable understanding of complex dynamical processes driving disease spread, and accurate, to turn high-precision data39,40,41,42 into targeted and viable recommendations. Applied to syphilis transmission, it shows that the resolution of sexual contact data requires matching resolution in modeling disease progression. Namely, its prodromic stage affects the epidemic threshold even if non-infectious. Applied to influenza, our framework predicts whether antiviral-resistant variants will become dominant, in agreement with the observed emergence and fixation of the oseltamivir-resistant A(H1N1) variant in 2007.

Results and discussion

Epidemic graph diagrams

We introduce here Epidemic Graph Diagrams (EGDs) to represent the coupled dynamics of arbitrarily complex disease spread, and arbitrarily complex contact patterns among hosts. Following a tradition in physics43,44, our diagrams are more than a representation of the diffusion equations on the network substrate: they replace them. We define rules to build the diagrams, and operations to manipulate them, by exploiting their topological properties. Then, we show that diagrams lead to a general, analytical derivation of the epidemic threshold. Also, diagram operations simplify its numerical computation, making EGDs a practicable analytics for public health.

We consider the classic compartmental description of the natural history of an infectious disease, through a progression of successive disease states regulated by transition rates. We consider population structure through an explicit time-varying contact network with adjacency matrix A(t), where nodes represent hosts and the entry Aij(t) encodes transmission-relevant connectivity between host i and host j at time t12,15,23,24,45. Contacts can thus change in time.

We use the Markov chain formulation in the quenched mean-field approximation, whereby the network appears in its explicit contact structure represented by the adjacency matrix A(t), and the dynamical states of the nodes are independent of each others and follow a system of deterministic differential equations26,28,30. For a generic compartmental model these equations can be written as:

$${\dot{x}}_{b}^{i}=\mathop{\sum}\limits_{c}\mathop{\sum}\limits_{j}{x}_{c}^{j}\left[\left({\gamma }_{bc}-{\mu }_{c}{\delta }_{bc}\right){\delta }^{ij}+{\lambda }_{bc}{A}^{ij}(t)\right]+f(x),$$
(1)

where \({x}_{b}^{i}(t)\) is the probability that node i is in compartment b at time t, and b runs over all compartments, except the susceptible S. μc, γbc, λbc are the rates of the transitions between compartments depicted in Fig. 1a: μ correspond to spontaneous transitions to S (e.g. recovery with no immunity); γ correspond to spontaneous transitions between any two compartments except S (e.g. exposed hosts becoming infectious); λ correspond to transmission events causing a susceptible node to enter another compartment (e.g. a susceptible becoming infectious). Transmissions are generated by infectious hosts of any kind (e.g. asymptomatic or symptomatic infectious individuals) when they are in contact with susceptible nodes. The component f(x) in Eq. (1) contains all terms that are nonlinear in x (quadratic or higher). It may be complex, but its derivatives vanish in the disease-free state (\({\left.\partial f/\partial {x}_{b}^{i}\right|}_{x=0}=0\)), therefore f does not contribute to the epidemic threshold and can be dropped. Eq. (1) can then be conveniently rewritten in operator form:

$$\dot{{{{{{{{\bf{x}}}}}}}}}(t)={{{{{{{\bf{x}}}}}}}}(t)\left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}{{{{{{{\bf{A}}}}}}}}(t)\right]={{{{{{{\bf{x}}}}}}}}(t){{{{{{{\bf{J}}}}}}}}(t).$$
(2)

In this formalism, x is a vector on \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{C}}\), where N is the number of hosts, and NC is the number of compartments (excluding S). μ, γ, λ are operators on \({{\mathbb{R}}}^{{N}_{C}}\), and A is an operator on \({{\mathbb{R}}}^{N}\). The term J(t) = γ − μ + λA(t) is the Jacobian of the system, itself an operator on \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{C}}\). The component λA(t) generates the interaction between the contact network structure and the structure of the compartmental model, so that nodes that are neighbors at a given time t affect each other’s probabilities to find themselves in given compartments.

Fig. 1: Epidemic graph diagrams.
figure 1

a Network epidemiology ingredients: compartmental model (top), and time-evolving network (bottom). The generic compartmental model appears in its standard representation: squares represent the different compartments, joined by transitions of three types. The susceptible compartment S (healthy individuals who can contract the disease from the infectious) is made explicit. We use generic letters for compartments to show the wide applicability of the approach without constraining to specific disease progressions. The transitions stem from the diagram visualization at the top of the panel. Type 1 transitions are shown as continuous lines and correspond to spontaneous processes; type 2 are shown as dashed lines, for transmission events involving susceptibles; type 3 are shown as dotted lines, for transmission events not involving susceptibles. b Epidemic Graph Diagram corresponding to the compartmental model of (a). Spontaneous transitions (type 1) become single links. Transmission events (type 2) become double links. Links are weighted by the corresponding transition rates (see a) multiplied by an operator on \({{\mathbb{R}}}^{N}\). This operator is the identity matrix on single links (omitted), and the adjacency matrix on double links. All other transitions, e.g. transmissions infecting other compartments than S (type 3) can be neglected and do not appear in the EGD (see “Methods”). c Jacobian corresponding to the example in (a, b). d Rules of the EGD grammar. The EGD is built directly from the compartmental model (a) following steps 1 and 2. The Jacobian is then the weighted adjacency matrix of the EGD minus diagonal terms that enforce the probability conservation (step 3), encoded in the diagonal entries γbb. These are not free parameters but are fixed by probability conservation, since transitions among compartments not involving S do not change the sum \({\sum }_{c}{x}_{i}^{c}\) (Eq. (1)). From this, we derive γbb = − ∑cbγbc.

The dynamics of the epidemic close to the disease-free state is captured by the infection propagator P (also an operator on \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{C}}\)), counting the possible transmission chains among hosts29,30,33. Following30, we compute P as a function of J using the theory of nonautonomous linear systems on Eq. (2):

$${{{{{{{\bf{P}}}}}}}}(t)={{{{{{{\mathcal{T}}}}}}}}\exp \left\{\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}s{{{{{{{\bf{J}}}}}}}}(s)\right\}={{{{{{{\mathcal{T}}}}}}}}\exp \left\{\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}s\left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}{{{{{{{\bf{A}}}}}}}}(s)\right]\right\},$$
(3)

where t0 is the initial time of the time window of analysis and \({{{{{{{\mathcal{T}}}}}}}}\) is Dyson’s time-ordering operator (see Methods). The largest eigenvalue of P yields the epidemic threshold29,30,33.

EGDs emerge as a graph-theoretical representation of the analytical treatment just presented. They are network representations that fully encode the infectious disease dynamics in real populations close to the critical spreading condition. They are composed by NC nodes, each representing a disease compartment (except S), connected by single directed links (spontaneous transitions) or double directed links (transmission events) (Fig. 1b). Nodes can also have self-loops (single, for transitions to S; double, for transmissions entering the same node). EGDs bypass Eq. (1), as they can be built directly from the classic compartmental model and the network adjacency matrix with simple rules (steps 1–2 of Fig. 1d). Most importantly, computing the weighted adjacency matrix of the EGD (plus probability conservation) yields exactly the operator J(t) (Fig. 1c,d) appearing in the expression of the infection propagator of Eq. (3) (see “Methods”).

Under the simple representation of a weighted directed graph, EGDs hide a higher-order complexity. Unlike common graphs, link weights are not scalar but operators on \({{\mathbb{R}}}^{N}\). Considering the relation between tensors and multilayer networks46, we can interpret EGDs as graphs of layers: each node in the EGD (compartment) is a replica layer of the contact network, and links in the EGD are coupling operators among layers.

Operationally, EGDs make it possible to simplify different disease models and interventions, compare them, build equivalence classes. This is done through three diagram operations: CUT, ZIP, and SHRINK. They decompose diagrams into simpler parts, and compress complex disease progressions. Figure 2 describes in detail the diagram operations, which we present hereafter through specific epidemic applications. The corresponding proofs are reported in Methods. Finally—and more pragmatically—EGDs are a powerful tool for computing the epidemic threshold.

Fig. 2: Diagram operations.
figure 2

a EGD of Fig. 1. The two strongly connected components are highlighted by blue rectangles. b Diagram operation CUT isolates the strongly connected components, which become disjoint subdiagrams. The epidemic threshold is then the smallest among the thresholds of each subdiagram. c Under the weak-commutation condition, each subdiagram can be further reduced through (i) diagram operation SHRINK (top) and (ii) diagram operation ZIP (bottom), leading to two EGDs of SIS models. Their transition rates are renormalized by the diagram operations. d, e General rules of the three operations. Numbers refer to the steps to be performed for each operator; some of these steps are illustrated in (ac) with the same numbers.

Syphilis

As a first application, we consider syphilis spreading on a network of sexual contacts. Syphilis is a bacterial STI that is still widespread in resource-constrained countries and resurgent in high-income countries in specific risk groups8. After infection, the disease progresses through a latency period lasting on average 3–4 weeks but with large fluctuations (10–90 days)47. It is then followed by two infective stages—primary and secondary syphilis. If untreated, syphilis enters latency, potentially leading to severe complications in the tertiary stage. Most models greatly simplify this disease progression and consider early stages only, under the assumption of antibiotic treatment, through susceptible-infectious-susceptible48 or susceptible-infectious-recovered-susceptible approaches49. Latency period is neglected, because complexity is shifted from disease progression towards capturing the heterogeneity of human sexual behavior48.

Here we want to keep such complexity in Aij(t) while also preserving the progression of the early stages of syphilis infection. Introducing the latency period following infection is often the first step into building a multi-stage disease history in realistic epidemic contexts50. Following the grammar of Fig. 1, we build the epidemic graph diagram of a susceptible-exposed-infectious-recovered-susceptible (SEIRS), where E corresponds to individuals exposed to the disease but not yet infectious (Fig. 3). Diagram operation CUT allows the isolation of two strongly connected components, one containing only R, and one containing E and I (Fig. 3b). The first has no transmission terms. It is thus always below threshold, and can be dumped. The second leads to the simplified version of the Jacobian of the original diagram (Fig. 3c).

Fig. 3: Epidemic threshold for syphilis.
figure 3

a Classic representation of the susceptible-exposed-infectious-recovered-susceptible (SEIRS) compartmental model here used to model the spread of a general STI. Parameterization for syphilis is provided in (d). E represents the class of individuals exposed to the infection who are exposed to the disease, before becoming infectious (I). R corresponds to temporary immunity. Rates are also shown. b Associated EGD. Strongly connected components are highlighted by blue rectangles. c Simplification of the EGD through diagram operations. The reduced diagram after CUT is shown with its associated Jacobian (top; the R component does not contribute to the threshold and is discarded). Under the weak-commutation condition, SHRINK further reduces the diagram to an EGD of an SIS model (bottom). d Relative difference in the prediction of the epidemic threshold in the full model (that is equivalent to the top diagram of panel c, after CUT) vs. the CUT+SHRINK diagram (bottom diagram of panel c) obtained under the weak-commutation condition (i.e. removing the role of latency). Results are obtained for an STI spreading on sexual network from real data23, exploring different lengths of infectious period (i.e. time-to-treatment) and latency period. Parameterization for syphilis infection is highlighted. A negative relative threshold variation indicates the threshold is lower in the full model compared to the weak-commutation one, i.e. the more realistic model predicts a higher risk than the approximated one. Both negative and positive variations are observed in the region of parameters corresponding to syphilis infection. The gray region of the plot indicates that the system is always below threshold. e Relative threshold variation as a function of the latency period for the three infection durations (2 weeks, 1 month, 1 year) corresponding to the white dashed lines in (d).

The diagram can however be further simplified if the network of contacts satisfies the weak-commutation condition30 (see “Methods”). This applies to homogeneous mixing, static and annealed networks30, and some temporally evolving network models such as the activity-driven model51. Under this condition, diagram operation SHRINK allows the removal of those nodes (compartments) having only one outgoing link in the EGD, as long as the link is single (Fig. 2c, e). Applied to syphilis, SHRINK eliminates E and reduces the original model to an SIS (Fig. 3c), whose threshold can be easily solved with the scalar version of the infection propagator approach29,30,33. Under weak-commutation, the latency period therefore plays no role in the condition for endemic circulation, extending to temporal networks the result previously restricted to static and annealed networks52. For example, approaches neglecting syphilis latency within the homogeneous mixing approximation49 will not be biased if temporal correlations in sexual activity are negligible. If instead such correlations exist, errors are to be expected. To evaluate such errors, we consider syphilis circulating on a sexual network between Brazilian sex workers and clients23 and measure the relative variation of the epidemic threshold computed on the CUT and on the CUT+SHRINK diagrams of Fig. 3c. Within the variation of latency duration reported in syphilis infected individuals47, we find errors in elimination predictions ranging from − 5 to 40% (Fig. 3d, e), if the model neglects latency. The presence of one additional timescale in a non-infectious compartment induces an interplay with the timescales of the contacts and of the other disease stages, either increasing or decreasing the conditions for syphilis spread with respect to predictions based on the SIS model. In a wave analogy, latency acts in tuning the phase shift between disease and network, from in-phase (boosting) to counterphase (hampering)53.

In the absence of treatment, or if treatment fails (e.g. for resistance emergence), later stages of syphilis infection need to be considered. The Supplementary Note reports the corresponding EGD and shows that even in that scenario the diagram can be simplified to the EGD of an SIS model (section S1).

The effect observed here for syphilis is relevant to other diseases, especially those for which natural history is poorly known, typically in an early phase of the outbreak. This is the case, for example, of the mpox outbreak: estimates from endemic areas were not directly applicable to the epidemic, which features different spreading routes, symptomatology, and risk factors for acquisition10. Waiting for reliable estimates of disease time scales (latency, generation time, detection) and stages (asymptomatic transmission), the analysis of epidemic scenarios requires a flexible and agile theoretical framework, as the one developed here, to account for uncertainties.

Pandemic influenza

Progression from latent infection to active disease may be far more complex than what illustrated for syphilis. Fully describing the natural history of influenza in human hosts, for example, requires a progression from exposed (E) to pre-symptomatic (P, infectious without showing symptoms yet), asymptomatic (A) or symptomatic (I) infectious, and recovered (R) (Fig. S3)6. Next to the inclusion of the latency period, the distinction among infectious individuals is critical for public health interventions. Only symptomatic infectious (I) can be detected in the population and administered antivirals. P, A cannot. Antivirals treat severe forms of seasonal influenza, but most importantly represent the first line of defense against an emerging pandemic, as they mitigate morbidity and mortality at the population level5,6. For this reason, neuraminidase inhibitors – the most common influenza antivirals—are stockpiled for pandemic preparedness4. However, large-scale antiviral administration may favor the emergence and spread of antiviral resistant influenza strains, compromising individual treatment and pandemic control54. Combined administration of two antivirals (combination therapy) has been proposed to defuse this threat, as opposed to monotherapy6. To study its effects in altering the pandemic potential of the circulating pandemic strain, we build the EGD of the full pandemic influenza disease progression model, adding antiviral combination therapy and emergence of drug resistance. Symptomatic infectious individuals (I) are treated with a certain probability pT. Mutation can occur in treated individuals giving rise to mono-resistant (to either drug, 1 or 2) and multi-resistant variants, with fitness cost ϕ. In the context of current pandemic preparedness4, drugs 1 and 2 would correspond to oseltamivir Tamiflu and zanamivir Relenza. The full model is described in the Supplementary Note (section S2). It includes 25 compartments, 19 of which describe infectious stages.

Despite the model complexity, four strongly connected components can be identified in the corresponding EGD (Fig. 4a, neglecting the trivial component composed only of R, as before). They summarise the infection dynamics of the wild-type, the two mono-resistant, and the multi-resistant strains. Moreover, the diagrams of the first three are isomorphic, underlying the equivalence of the associated dynamics. The CUT operation provides then the epidemic threshold as the lowest among the critical conditions of the different strains. Each condition can be computed from data on population structure, A(t), and with empirically-informed strain-specific parameter values (section S2).

Fig. 4: Epidemic threshold for pandemic influenza with antiviral combination therapy and emergence of resistance.
figure 4

a EGD (epidemic graph diagram) of the pandemic influenza model and associated simplifications. The model includes four strains—wild-type, two mono-resistant, and one multi-resistant. The recovered (R) compartment has already been CUT out for the sake of visualization (see also Fig. 3b). The four strongly connected components (blue rectangles) correspond to the dynamics of each strain, independently. They can be isolated using CUT. EGD indicates the representation of an epidemic graph diagram. Under the weak-commutation condition, the ZIP operation reduces each of the four subdiagrams to an SIS-like EGD with renormalized strain-specific parameters. Parameter definitions and values appear in Supplementary Tables S1S3 and S5. b Relative threshold variation under treatment (pT > 0) compared to no treatment (pT = 0) as a function of treatment probability pT and fitness cost, assumed to be the same for both antivirals (ϕ1 = ϕ2). Positive relative threshold variation indicates the epidemic threshold is higher when antiviral drugs are used for therapy, i.e. the risk for a pandemic is reduced. The black line separates the two dominance regimes, for the wild-type strain and for the multi-resistant variant. c Relative threshold variation as a function of pT along the values of fitness cost ϕ1 indicated by the white dashed lines in (b). Treatment increases the epidemic threshold. But after a critical pT the multi-resistant strain becomes dominant and further increasing treatment has no additional effect. d As (b) when fitness costs are specific to the drug, i.e. ϕ2 > ϕ16. A phase in which a mono-resistant strain dominates appears, differently from the situation depicted in (b). In addition, when ϕ1 < 0 (resistance increases transmissibility), a region in parameter space emerges where threshold variation is negative (red region), i.e. the pandemic risk is increased by the use of antiviral drugs, due to the emergence of resistance. The red arrow indicates the parameter values estimated for the oseltamivir-resistant H1N1 strain, globally dominant in 2007–200854. e Boundaries of the three dominance phases of panel d when varying ϕ2. We report their analytical derivation in S2.2.

But the influenza diagram can be further simplified in most contexts. The short time scale of face-to-face proximity interactions along which influenza transmission can occur24 makes annealed approximation, i.e. the weak-commutation assumption30, the commonly adopted approximation. The diagram can then be fully compressed through the ZIP operation (Fig. 2c, e), because transmission events involve exclusively susceptible individuals entering the same compartment (here, exposed E). The difference between ZIP and SHRINK is that ZIP imposes global requirements on the diagram topology and compresses a full diagram, whereas SHRINK has only local requirements and merges two compartments at the time. The multi-resistant strain EGD and the three isomorphic EGDs (wild-type, and two mono-resistant strains), made of 4 and 5 compartments, respectively, are all ZIPped into an EGD of an SIS model with renormalized transmission and recovery rates (Fig. 4a). The physical intuition is that multi-stage disease progression may change the speed of diffusion, but not whether the epidemic will break out or not, provided there are no dynamical interactions between the disease and the underlying contact network. Theoretically, we are able to disentangle the dynamics of the four different viral strains, analyze each of them separately, and show that they are equivalent (after appropriate parameter renormalization). Practically, we reduce a 25-compartment influenza model into a simple SIS model with substantial numerical gain: computing the largest eigenvalue of four N-dimensional matrices, instead of a 23N - dimensional matrix.

The factorization of the network component in the weak-commutation condition (section S2) allows us to make predictions on the dominant influenza variant by comparing the strain-specific thresholds against the scenario with no treatment (pT = 0). Increasing treatment coverage helps controlling pandemic influenza, especially for high fitness costs of the resistant variants (Fig. 4b with ϕ1 = ϕ2). Above a certain value of pT, however, the multi-resistant strain dominates and the likelihood of its establishment is not affected anymore by the treatment coverage. The saturation effect depends on the fitness cost of the resistant strains (Fig. 4c).

Allowing fitness costs to depend on the specific drug (ϕ1 < ϕ2) leads to the emergence of an additional phase where the mono-resistant strain dominates, for small enough treatment probabilities and fitness costs (Fig. 4d). Notably, there exists a range of ϕ1 where increasing the intervention coverage leads to all possible phases—namely, either the wild-type dominates (small pT), or the mono-resistant strain (intermediate pT), or the multi-resistant variant (large pT). The dominance of a single mono-resistant strain is favoured by an increase in transmissibility following mutation (ϕ1 < 0), as expected. Though rare, such condition was observed in the 2007–2008 influenza season when an oseltamivir-resistant H1N1 variant emerged and rapidly spread, becoming the dominant H1N1 strain globally54. Dominance can happen also under reduced transmissibility of both mono-resistant strains (ϕ1 > 0), for a smaller region of parameter values. Large clusters of oseltamivir-resistant H1N1 variants were isolated in Japan in 2013–2014 influenza season that caused large community outbreaks but did not lead to large-scale dominance54. Increasing the difference between the two fitness costs magnifies the size of the mono-resistant phase (Fig. 4e).

The models presented so far are not age-structured. In the Supplementary Note (Section S3) we use COVID-19 to show that EGDs can accommodate age-structured populations, a crucial feature for diseases for which exposure, transmission, or morbidity are age-dependent.

The use of EGDs for pandemic influenza and syphilis show the versatility of the theoretical framework in solving the critical epidemic conditions while handling the full complexity of contact data, disease natural history, and interventions. Limited work so far has used a network representation to describe the dependency between different disease stages55. The EGDs formalism makes this dependency and the role of the human contact network transparent. EGDs are agnostic of the data they are fed. As such, they may provide inaccurate or imprecise estimates if the input data are biased or feature high uncertainty, which may be the case for data feeds coming in real time or representing future estimates. EGDs, however, requiring minimal computational power, can accommodate any sensitivity analysis by varying the input data and observing the response in the EGD output. EGDs share the same approximation of the infectious propagator approach, i.e. the quenched mean-field assumption26,28. Its validity has been numerically tested for the critical condition56, and specifically within the infection propagator framework29. All other results leading to the EGDs and their simplifications are obtained from the properties of the Jacobian, under no additional approximation. EGDs thus provide the analytics to predict public health risks at high granularity and to customise interventions, responding to the challenges of today’s public health.

Methods

Infection propagator and epidemic threshold

We expand here the expression of the infection propagator of Eq. (3), which was derived using the theory of nonautonomous linear systems57. Dyson’s time-ordering operator58 is defined as follows: \({{{{{{{\mathcal{T}}}}}}}}{{{{{{{\bf{A}}}}}}}}({t}_{1}){{{{{{{\bf{A}}}}}}}}({t}_{2})=\theta ({t}_{1}-{t}_{2}){{{{{{{\bf{A}}}}}}}}({t}_{1}){{{{{{{\bf{A}}}}}}}}({t}_{2})+\theta ({t}_{2}-{t}_{1}){{{{{{{\bf{A}}}}}}}}({t}_{2}){{{{{{{\bf{A}}}}}}}}({t}_{1})\). θ is Heaviside’s step function. The time-ordered exponential of Eq. (3) is then a common representation of the following series:

$${{{{{{{\bf{P}}}}}}}}(t)=\mathop{\sum }\limits_{h=0}^{\infty }\frac{1}{h!}\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}{y}_{1}\cdots {{{{{{{\rm{d}}}}}}}}{y}_{h}{{{{{{{\mathcal{T}}}}}}}}\left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}{{{{{{{\bf{A}}}}}}}}({y}_{1})\right]\cdots \left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+\lambda {{{{{{{\bf{A}}}}}}}}({y}_{h})\right].$$
(4)

Time-ordering can be made explicit:

$${{{{{{{\bf{P}}}}}}}}(t)=\mathop{\sum }\limits_{h=0}^{\infty }\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}{y}_{1}\int\nolimits_{{t}_{0}}^{{y}_{1}}{{{{{{{\rm{d}}}}}}}}{y}_{2}\cdots \int\nolimits_{{t}_{0}}^{{y}_{h-1}}{{{{{{{\rm{d}}}}}}}}{y}_{h}\left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}{{{{{{{\bf{A}}}}}}}}({y}_{h})\right]\cdots \left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}{{{{{{{\bf{A}}}}}}}}({y}_{1})\right].$$
(5)

The infection propagator encodes the epidemic dynamics close to the epidemic threshold: \(P{(t)}_{bc}^{ij}\) is the probability that j is in compartment c at time t, given that i is in compartment b at time t = 0, under the quenched mean-field assumption. Equation (5) is also the most explicit formulation of the interaction between the temporal evolution of the epidemic and the temporal evolution of the contact network. We can now generalize what found in ref. 29 for the SIS model, and compute the epidemic threshold. The epidemic threshold is the critical parameter surface separating the disease-free state to the endemic phase. In the SIS model it is a one-dimensional curve relating the transmission rate from the recovery rate. Here instead it is a (NC − 1)-dimensional surface defined as follows:

$$\rho \left[{{{{{{{\bf{P}}}}}}}}(T)\right]=1,$$
(6)

where T is the final observation time and ρ indicates the spectral radius, i.e., the largest eigenvalue. In practice, it is still possible to obtain an equation for one single parameter if all transmission rates λbc are expressed as functions of the baseline transmission rate, whose critical value gives the epidemic threshold. This is the case of the pandemic influenza model, for example, where transmissibility of each infectious compartment (e.g. P, A, treated symptomatic infectious individuals, mono-resistant or multi-resistant strains, etc.) is defined as a rescaling of the transmissibility of the symptomatic infectious individual (e.g. transmissibility of the mono-resistant strain is equal to the transmissibility of the wild-type, rescaled for the fitness cost, see Supplementary Note section S2).

To numerically compute the spectrum of P, an operator on \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{C}}\), we interpret the tensor products in J as Kronecker products. The resulting matrix has dimension NNC, and has a block structure, with each block of dimension N, as shown in Fig. 1c. Formally, we are exploiting the isomorphism \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{C}}\simeq {{\mathbb{R}}}^{N{N}_{C}}\), and this is equivalent to the supra-adjacency representation of multilayer networks46. When γ = 0, and NC = 1 (scalar), we recover the infection propagator of the susceptible-infectious-susceptible model30.

If the network obeys the weak-commutation condition30,59, both the infection propagator (Eq. (3)) and the threshold (Eq. (6)) simplify. The weak-commutation condition is defined in ref. 30 as

$$\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}x[{{{{{{{\bf{A}}}}}}}}(x),{{{{{{{\bf{A}}}}}}}}(t)]=0,\,\forall t,$$
(7)

where [A(x), A(t)] = A(x)A(t) − A(t)A(x) is the standard matrix commutator. As ref. 30 proves, this is true in two cases: (i) temporal correlations in the evolution of A(t) are absent (network annealing); (ii) the timescale of temporal correlations in the evolution of A(t) is much shorter than the timescale of the spread of the disease (timescale separation). In both cases all the integrals in Eq. (4) commute and the time ordering can be dropped. As ref. 30 proves, this implies that it is possible to replace A(t) with the average adjacency matrix \(\bar{{{{{{{{\bf{A}}}}}}}}}=\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}t{{{{{{{\bf{A}}}}}}}}(t)/(T-{t}_{0})\), and, in the same way, to replace the time-evolving Jacobian J(t) with its average \(\bar{{{{{{{{\bf{J}}}}}}}}}=\int\nolimits_{{t}_{0}}^{t}{{{{{{{\rm{d}}}}}}}}t{{{{{{{\bf{J}}}}}}}}(t)/(T-{t}_{0})\) over the time window of analysis. This makes it possible to sum the series in Eq. (3). The infection propagator is then the exponential of the average Jacobian:

$${{{{{{{\bf{P}}}}}}}}(t)={e}^{t\bar{{{{{{{{\bf{J}}}}}}}}}}={e}^{t\left({{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}\bar{{{{{{{{\bf{A}}}}}}}}}\right)}.$$
(8)

Its spectrum is the exponentiation of the spectrum of J. From this, the equation of epidemic threshold becomes

$$\rho \left[{{{{{{{\boldsymbol{\gamma }}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}+{{{{{{{\boldsymbol{\lambda }}}}}}}}\bar{{{{{{{{\bf{A}}}}}}}}}\right]=0.$$
(9)

In case of an SIS model (γ = 0; μ, λ scalars), Eq. (9) becomes the well-known formula \(\lambda=\mu /\rho [\bar{{{{{{{{\bf{A}}}}}}}}}]\)26,28.

Lastly, we comment on the term f(x) in Eq. (1). As stated in the paper, this does not contribute to the epidemic threshold because its contribution to the Jacobian in the disease-free state vanishes. In the classical representation of a compartmental model showed in Fig. 1a, type 3 transitions (i.e. transmissions infecting other compartments than S) would be contained in f. They would correspond to quadratic terms of the form xbxc. Since f can be ignored, this means that type 3 transitions can be ignored too, and never appear in EGDs.

From the infection propagator to the EGD

We stated that epidemic graph diagram is a graphical representation of J, which can be obtained from the EGD by computing its weighted adjacency matrix (plus conservation condition), as shown in Fig. 1d:

$${J}_{bc}(t)={W}_{bc}(t)-{\delta }_{bc}\mathop{\sum}\limits_{d}{\gamma }_{bd}\,.$$
(10)

To proof the equivalence between the EGD and the infection propagator approach to analytically compute the epidemic threshold, we need to demonstrate that the Jacobian of the above equation is exactly J(t) = γ − μ + λA(t), i.e. the expression of the Jacobian obtained from the dynamical equations in the main paper. We note that the parameters γbc appearing in the general compartmental model (see Fig. 1a) are never diagonal: b ≠ c. Therefore, the weighted adjacency matrix of the EGD that appears in Eq. (10) is

$${W}_{bc}(t)={\gamma }_{bc}(1-{\delta }_{bc})-{\mu }_{b}{\delta }_{bc}+{\lambda }_{bc}A(t),$$
(11)

where we made explicit the off-diagonal nature of γbc. The diagonal elements of operator γ in Eq. (3) are not parameters of the model, they are fixed by probability conservation, i.e. γbb = − ∑dbγbd. We therefore need to add this last term to the weighted adjacency matrix of the EGD to account for the diagonal entries of operator γ of Eq. (3), and this explains the nature of the second term in Eq. (10). This proves that the Jacobian built from the EGD is exactly the operator J(t) of the infection propagator.

Diagram operations

We prove here the diagram operations described in the paper, and in Fig. 2. They are CUT, ZIP, and SHRINK. CUT decomposes EGDs in smaller subdiagrams using network approaches and allows the computation of the threshold on each subdiagram separately. ZIP compresses complex diseases progressions into SIS-like diagrams. SHRINK merges pair of nodes. ZIP exploits and transforms the global topology of the diagram, while SHRINK acts locally on pairs of compartments.

Proof of CUT

An EGD is a directed graph, and it is possible to order its nodes so that its adjacency matrix is block-upper-triangular, with the blocks representing the strongly connected components. J inherits the same property, if intended as the supra-adjacency matrix46,60 of the multilayer structure represented by the EGD, i.e., the adjacency matrix of the flattened graph exploiting \({{\mathbb{R}}}^{N}\otimes {{\mathbb{R}}}^{{N}_{c}}\simeq {{\mathbb{R}}}^{N{N}_{C}}\). And so does P, because it is a convolution of J(t) at different times. It follows that it is possible to compute the spectrum of P as the union of the spectra of the blocks in the main diagonal, which are the subdiagrams obtained via CUT.

Proof of ZIP

We use the notation \({{{{{{{\mathcal{EGD}}}}}}}}\) for the mathematical representation of the diagram. Let us assume that weak-commutation condition holds (\({{{{{{{\bf{A}}}}}}}}(t)\equiv \bar{{{{{{{{\bf{A}}}}}}}}}\)), and the epidemic graph diagram \({{{{{{{\mathcal{EGD}}}}}}}}\) is made of a node D and a subdiagram \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\). Moreover, let us assume that \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\) contains only single links (its Jacobian is γX − μX). Single links may exist from D, to any node in \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\): We call \({\gamma }_{r}^{(+)}\) the weight of the single link from D to the r-th node of \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\). Single links may also exist from any node in \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\) to D: We call \({\gamma }_{r}^{(-)}\) the weight of the single link from the r-th node of \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\) to D. The index r runs on the Nc − 1 nodes of \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\). D may have both single and double self loops, whose weights we call − μ0 and \({\lambda }_{0}\bar{{{{{{{{\bf{A}}}}}}}}}\), respectively. Double links may exist from any node in \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\) to D (λr). The resulting J of the full EGD is

(12)

with \({{{{{{{\bf{d}}}}}}}}=\,{{\mbox{diag}}}\,({\gamma }_{1}^{(-)},\cdots \,,{\gamma }_{{N}_{c}-1}^{(-)})\). The threshold condition in timescale separation is \(\det {{{{{{{\bf{J}}}}}}}}=0\). Using block matrix determinant rules, this simplifies to a determinant in \({{\mathbb{R}}}^{N}\):

$$\det \left[-{\mu }_{eff}+{\lambda }_{eff}\bar{A}\right]=0,$$
(13)

with

$${\mu }_{eff}={\mu }_{0}+\mathop{\sum}\limits_{r}{\gamma }_{r}^{(+)}+\mathop{\sum}\limits_{rs}{\gamma }_{r}^{(+)}{\gamma }_{s}^{(-)}{\left[{\left({\gamma }_{X}-{\mu }_{X}-d\right)}^{-1}\right]}_{rs};$$
(14)
$${\lambda }_{eff}={\lambda }_{0}-\mathop{\sum}\limits_{rs}{\gamma }_{r}^{(+)}{\lambda }_{s}{\left[{\left({\gamma }_{X}-{\mu }_{X}-d\right)}^{-1}\right]}_{rs}.$$
(15)

γX − μX − d is always invertible if we assume that the model is below the epidemic threshold in the absence of transmission. If it were not the case, the epidemic would be a trivial spontaneous generation of infected individuals. Or, in epidemiological terms, there would be only primary disease introductions. This means that J(λ0 = 0, λr = 0) is negative definite. By virtue of Sylvester’s criterion, this in turn implies that γX − μX − d is also negative definite, because all the leading principal minors of the latter are also leading principal minors of J. And since γX − μX does not depend on transmission, γX − μX − d is always negative definite, and so invertible. We complete the proof of ZIP by noting that Eq. (13) is equivalent to an SIS model with renormalized recovery and transmission rates μeff, λeff.

Proof of SHRINK

The proof is conceptually similar to ZIP’s and, as for ZIP, requires that the weak-commutation condition holds so that the time-evolving Jacobian is replaced by its average \(\bar{J}\). Let us assume that the epidemic graph diagram is composed of compartment B and a subgraph \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\), with Jacobian \({{{{{{{{\boldsymbol{\gamma }}}}}}}}}_{X}-{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{X}+{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{X}\bar{{{{{{{{\bf{A}}}}}}}}}\). Moreover, let us assume that a single link γ0 goes from B to a node in \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\), which we label as the first (r = 1). B may have no other outgoing link, and no self loops. Any node in \({{{{{{{{\mathcal{EGD}}}}}}}}}_{X}\) may have single and double links going to B. The Jacobian is

(16)

with \({{{{{{{\bf{d}}}}}}}}=\,{{\mbox{diag}}}\,({\gamma }_{1},\cdots \,,{\gamma }_{{N}_{c}-1})\). Using block matrix determinant rules, the condition \(\det {{{{{{{\bf{J}}}}}}}}=0\) is equivalent to the following Nc − 1 dimensional determinant

$$\det \left\{{{{{{{{{\boldsymbol{\gamma }}}}}}}}}_{X}-{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{X}+{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{X}\bar{{{{{{{{\bf{A}}}}}}}}}-{{{{{{{\bf{d}}}}}}}}+\left(\begin{array}{cccc}{\gamma }_{1}+{\lambda }_{1}\bar{{{{{{{{\bf{A}}}}}}}}}&0&\cdots \,&0\\ \vdots &\vdots &\cdots \,&\vdots \\ {\gamma }_{{N}_{c}-1}+{\lambda }_{{N}_{c}-1}\bar{{{{{{{{\bf{A}}}}}}}}}&0&\cdots \,&0\\ \end{array}\right)\right\}=0.$$
(17)

This is equivalent to a diagram in which B disappears, its outgoing link γ0 disappears, and all incoming links of B get rerouted onto the ancient target of link γ0.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.