Micro-scale foundation with error quantification for the approximation of dynamics on networks

Epidemics, voting behaviour and cascading failures in power grids are examples of natural, social and technological phenomena that can be modelled as dynamical processes on networks. The study of such important complex systems requires approximation, but the assumptions that underpin the standard mean-field approaches are routinely violated by dynamics on real-world networks, leading to uncontrolled errors and even controversial results. Consequently, determining the approximation precision has been recognised as a key challenge. We present a micro-scale foundation for mean-field approximation of a wide range of dynamics on networks that facilitates quantification of approximation error, elucidating its connection to network structure and model dynamics. We show that our coarse-graining approach minimises approximation error and we obtain an upper bound on this uncertainty. We illustrate our approach using epidemic dynamics on real-world networks. Mean-field approximations in complex networks, while useful, often vary in accuracy and can be particularly problematic in the case of dynamical correlations. Here, the authors report the use of approximate lumping to derive low-dimensional mean-field equations and estimate the overall approximation error for real-world networks.

T he study of natural, social and technological phenomena in complex systems invariably requires approximations that coarse-grain and simplify, so that insights can be obtained about the causal mechanisms at work. A case in point, and our focus, is the study of dynamical processes on complex networks 1 , such as models of epidemics 2,3 , opinion dynamics [4][5][6] , the diffusion of innovations 7-10 , the evolution of languages [11][12][13] and cultural polarisation 14,15 . The standard approach to analyse dynamics on networks is via mean-field approximations, which range in accuracy and complexity 2,[16][17][18][19] . While such methods have provided important insights, the assumptions that underpin mean-field approximations-the absence of clustering ('a friend of a friend is my friend'), modularity (community structure) and dynamical correlations ('I'm similar to my neighbours')-are routinely violated by dynamical processes on real-world complex networks and it is generally difficult to quantify how well a particular approximation will do a priori, given the network or dynamical process 20 . Mean-field approximation has also resulted in controversy concerning the critical epidemic threshold in scalefree networks 17,[21][22][23] . Because of these issues, the quantification of approximation error has been recognised as one of the key challenges for network epidemic modellers 24 .
In this article we address these critical issues by presenting a foundation for mean-field approximations of dynamics on networks, which builds from the micro-scale description of the dynamics and facilitates the quantification of approximation error. We use approximate lumping to derive low-dimensional mean-field equations for a broad class of Markov chain dynamics on networks which includes models of epidemics and opinion dynamics. The coarse-grained states are based on the number of each type of 'vertex-state', such as the number of susceptible and infected vertices in the susceptible-infected-susceptible (SIS) model of epidemics. In contrast to standard mean-field approximations, the transition rates between these coarse-grained states are derived directly from the exact evolution of the probability distribution over states-known as the master equation or forward Kolmogorov equation-and are shown to minimise approximation error, in the sense that they are closest to an exact lumping. This provides a theoretical underpinning that simplifies and standardises the process of deriving mean-field approximations for practitioners: the microscopic formulation of a model can be easily translated into a mean-field approximation using the formulae we have obtained. Furthermore, this approach enables us to derive a bound on the approximate lumping error and compare this to errors computed from stochastic simulation of epidemic dynamics on several benchmark real-world networks.

Results and discussion
We consider Markov chain dynamics on finite, connected networks with undirected, unweighted edges and no self-loops, where each vertex in the network can be in one of a finite number of "vertex-states". For example, in models of epidemics the vertex-states correspond to individuals' disease status, which could be susceptible to infection, infected, recovered, etc. In models of voting behaviour, the vertex-states correspond to the party that each person plans to vote for. If M is the number of vertex-states and N is the number of vertices, then there are M N possible states, i.e. configurations of vertex-states on the network. Thus the size of the full state-space for Markov chain dynamics on networks is extremely large, even for moderate N, and consequently, unless the network contains significant symmetry 25,26 , approximation is essential. Despite this, the state-space is finite so we denote the probability distribution at time t over state-space by XðtÞ ¼ ðX 1 ðtÞ; X 2 ðtÞ; ; X M N ðtÞÞ T , where X k (t) is the probability of being in the kth state. Variables related to the full state-space will be upper-case Latin letters and the indices k and l indicate that the index is over the full state-space. In continuous time t, the evolution of X(t) is described by the forward Kolmogorov or master equation 27 , where Q is the infinitesimal generator, an M N by M N matrix in which each off-diagonal component Q kl gives the transition rate from state S [k] to state S [l] , and the diagonal components ensure that rows sum to zero. Bold variables indicate matrices. Our approach can also be adapted to discrete time models.
In the "Methods" section we describe how the components of the infinitesimal generator relate to the microscopic dynamics, i.e. the transition rates of individual vertices between vertex-states. We assume that the positive entries of the infinitesimal generator are affine (i.e. constant plus linear) functions of the number of neighbouring vertices in each vertex-state. For example, in epidemic models, a susceptible vertex typically becomes infected at a rate proportional to the number of infected neighbours. We also focus on 'homogeneous' models where the micro-scale transition rates are identical for all vertices with the same number of neighbours in each vertex-state. These features define a class of network dynamics that we call 'homogeneous single-vertex transition models' (homogeneous SVTs) with 'affine vertex-state transition matrices' (affine VSTMs). Specifically, if a model has an affine VSTM then a vertex in vertex-state A, with n m neighbours in the mth vertex-state, transitions to vertex-state B with rate where the δ A;B m are arbitrary non-negative constants. This covers a broad range of dynamical processes on networks 28 , but in Supplementary Note 5 we also consider generalisations to heterogeneous and nonlinear network dynamics with quadratic VSTMs.
We coarse-grain such Markov chain network dynamics using a method called approximate lumping, in which states are grouped together (lumped) according to a pre-defined partition of statespace 29 . We consider approximate lumping partitions based on sets of states that have the same total number of vertices in each vertex-state, i.e. the number of susceptible and infected vertices in the SIS model. We refer to this type of approximate lumping as a population model approximation 30 . To make this precise, let s 2 Z M ≥ 0 be a lumped state, which is a vector of length M whose mth component, s m , denotes the number of vertices in the mth vertexstate. Lumped variables will be lower-case Latin letters and m will index vertex-states. It follows that there are r ¼ N þ M À 1 N possible lumped states, since a lumped state is a combination of N vertex-states drawn from M possibilities with repetition. Thus we number the lumped states in the lumped state-space s [1] , s [2] , …, s [r] and we use Π = {Π 1 , Π 2 , …, Π r } to denote the corresponding lumping partition. Let xðtÞ ¼ ðx 1 ðtÞ; ; x r ðtÞÞ T denote the time-dependent Markov chain probability distribution over Π, where x i (t) is the probability of being in the lumped state s [i] . We use indices i and j to indicate that the index is over the lumped state-space. The evolution of x(t) is then the solution to where q is the approximate lumping generator, which needs to be determined.
The idea here is to use the coarse-grained generator q = DQC, where C 2 f0; 1g M N r is the collector matrix 29 , whose kjth component is one if S [k] ∈ Π j and zero otherwise, and D 2 R r M N is the distributor matrix, whose ilth component is 1/|Π i | if S [l] ∈ Π i and zero otherwise. The effect of using q = DQC is to average the sum of rates out of states in one lumping partition cell and into another. This approach has the following advantages. Firstly it minimises error, in the sense that it is closest to an exact lumping where QC = Cq (details in the Methods section), which is made precise in the following theorem.
Secondly, the matrix q can be explicitly derived for affine network dynamics, leading to the following theorem.
Theorem 2.2. Let Ω be the state-space of a homogeneous SVT with affine VSTM on a network with mean degree z, and let q = DQC be the lumped infinitesimal generator corresponding to the population model approximation Π = {Π 1 , Π 2 , …, Π r } with lumped states s [1] , s [2] , …, s [r] . If s [i] and s [j] correspond to a single vertex changing from vertex-state A to B and s ½i 1 is the number of vertices in vertex-state A, then These are the main theoretical results of the paper. Outlines of the proofs are given in the "Methods" section and further details are provided in the Supplementary Methods.
For concreteness, we illustrate the approximate lumping approach in Fig. 1 using the SIS model of epidemic dynamics, which has M = 2 and is an example of "binary-state dynamics" 31 . The vertex-states of the SIS model are referred to as susceptible (S) and infected (I ). A susceptible vertex with n 1 infected neighbours becomes infected with rate βn 1 and an infected vertex recovers with rate γ, where β, γ > 0 are model parameters. In relation to our notation for affine VSTMs introduced in Eq. (1), we have δ S;I 1 ¼ β, δ I ;S 0 ¼ γ and all other δ A;B m are zero. Our approach partitions state-space into "levels", so that the ith level, Π i , contains all states that have i infected vertices, and this reduces the size of state-space from 2 N to N + 1. For SIS dynamics, we obtain a mean-field birth-death process with infection rates given by and recovery rates q i;iÀ1 ¼ γi: These rates will be unsurprising to those familiar with mean-field approximations of network dynamics, but note that here we have derived these directly from the full Markov chain description rather than via moment closures based on non-rigorous probabilistic arguments, as is typical 2 . For the SIS model and other binary-state dynamics, this approach gives rise to a birth-death process; for network dynamics with M > 2, it yields a Markov population model 30 .
In the lumped state-space, the error of our approximation is y(t) = C T X(t)−x(t) and so This is an inhomogeneous linear system of ODEs, thus applying the variation of constants formula yields where we have assumed that y(0) = 0, i.e. the lumped initial state C T X(0) is known. To simplify the error computation we assume that the initial distribution of the full Markov chain is stationary so that X(t) = X * . Quasi-stationary distributions can also be handled in an analogous way and are discussed in Supplementary Note 4. In the "Methods" section, we derive a bound on the stationary absolute mean error for binary-state dynamics. However, this involves terms that depend on the full Markov chain, so we must resort to approximations to make further progress. We focus on the SISa model 32 , which is similar to the SIS model but has an additional 'ambient' infection rate α, so a susceptible vertex with n 1 infected neighbours becomes infected with rate α + βn 1 . Recovery is the same as in the SIS model. Unlike the SIS model, where the state with all susceptible vertices is absorbing, the SISa model has a stationary distribution. In the "Methods" section we obtain a bound on the stationary absolute mean error of the SISa model that depends on a þ i , which is a constant related to the state that has the largest or smallest number of edges between susceptible and infected vertices in the ith level. Unfortunately, computing a þ i is computationally difficult (an algorithm that did so would provide a solution to the Max-Cut problem, which is NP-complete 33 ). Thus we settle for an estimate, e a þ i > 0, obtained from a tractable greedy algorithm, described in detail in the "Methods" section, that sequentially picks susceptible vertices to become infected which introduce the largest or smallest number of edges between susceptible and infected vertices. Our numerically tractable bound depends on an assumption about e a þ i x Ã i and the full system, which is made precise in the "Methods" section. In Supplementary Note 3 we show that while this assumption does not always hold, we typically obtain an informative bound regardless. We also propose an approximation a Ã i x Ã i based on averaging the minimum and maximum number of edges between susceptible and infected vertices at each level, although this approximation does not have a rigorous foundation.
Application to real-world networks. To illustrate the application of our results on a topical example, we use the SIR model of epidemics on a real-world contact network derived from GPS data. There are three vertex-states in the SIR model, namely susceptible, infected and recovered, which we denote by S, I and R respectively. A susceptible vertex with n 1 infected neighbours becomes infected at a rate βn 1 , and an infected vertex recovers at a rate γ. There are 3 N states in the full Markov chain and (N + 2) (N + 1)/2 lumped states, corresponding to distinct numbers of vertices in each of the vertex-states. The lumped transition rate q ij from the ith lumped state with s ½i S susceptible vertices and s ½i I infected vertices, to the jth lumped state in which a susceptible vertex has become infected is (Note that here it is convenient to use the vertex-states I and S rather than an integer to index the lumped state s [i] ). Similarly, if an infected vertex recovers then the lumped transition rate is q ij ¼ γs ½i I : There are N + 1 lumped absorbing states in which there are no infected vertices and the number of recovered vertices ranges from zero to N.
We used a real-world contact network derived from data collected as part of the BBC documentary 'Contagion! The BBC Four Pandemic' 34,35 . This study collected GPS traces of people who downloaded the 'BBC Pandemic' smart phone application. Data made publicly available from this study consists of timestamped anonymised pairwise distances within 50 m between 469 participants around the town of Haselmere, UK. We aggregated these data to create a static network between participants that came within 1 m of each other. We used the largest connected component of this network, which consists of N = 369 people and has mean degree z = 5.53. We refer to this as the 'Haselmere 1m' network. We used parameters γ = 1 and β = γR 0 (N−1)/(zN), where R 0 = 3, since this would give a reproduction number of R 0 in the corresponding compartmental model equations. Initially five vertices were selected uniformly at random to be infected.
In Fig. 2a, b we compare stochastic simulations (red) of the SIR model on the Haselmere 1m network with the corresponding approximate lumping (blue). Figure 2a illustrates the mean number of infected vertices over time (thick solid lines) and the corresponding 90-percentile of the simulated and approximate lumping distributions (shading). We also include, for comparison, results from homogeneous, heterogeneous and individual-based mean-field approximations (dashed, dot, and dash-dot lines respectively-see Supplementary Note 1 and Kiss et al. 2 for details), illustrating that the accuracy of our approach is comparable. However, our approach also produces a full probability distribution over the lumped states, which we use to compute the percentiles in Fig. 2a. This distribution could also be used for Bayesian parameter estimation and even data assimilation. Furthermore, with our approach we are able to compute absorption statistics and in Fig. 2b we compare the absorption probability into each absorbing state (i.e. the total number of infected individuals) of stochastic simulations (grey) and our approximate lumping (blue).
Low dimensional mean-field approximations can perform poorly on networks with heterogeneous structure (e.g. when hubs, clustering or communities are present), and Fig. 2a, b illustrate this. By way of contrast, we also present results for an Erdős-Rényi graph where the accuracy of mean-field approximations is better. Specifically, we chose a network uniformly at random from those with N = 369 vertices (the same size as the Haselmere 1m network) and mean degree z = 20 (i.e. selecting 3690 random edges-note this is the less common type of Erdős Rényi graph), and in Fig. 2c, d we illustrate results corresponding to those in Fig. 2a, b, respectively. In this case, the accuracy is significantly improved and our approach even appears marginally better than the other comparable mean-field theories illustrated. We obtain similar results if we average over many graphs sampled at random.
In Fig. 3 we compare our error bound with the error produced via stochastic simulations of the SISa model on four benchmark real-world networks, including the Haselmere 1m network 34,35 , a protein interaction network [36][37][38] , an autonomous-systems Internet network 39 and a US power grid network 40 . For each network in Fig. 3, we compute stochastic simulations of SISa dynamics on the network with ambient infection rate α = 0.01, infection transmission rate β = 2(γ−α)(N−1)/(zN) and recovery rate γ = 1, which would give a stationary infected fraction of 0.5 in the corresponding SISa compartmental model. Half of the vertices are chosen uniformly at random to be initially infected and the number of infected vertices is computed after the process is approximately stationary. For each network, we compute the mean fraction of infected vertices from multiple realisations of the stochastic simulations. We also numerically compute solutions of the lumped system to find the lumped probability distribution x(t) with initial condition corresponding to the average number of infected vertices of the stationary stochastic simulations. The stochastic simulation error (solid black lines in Fig. 3) is the absolute magnitude of the difference between the mean fraction of infected vertices in the stochastic simulations and approximate lumping. We compare this with our bound on the approximate lumping error (red dashed lines in Fig. 3) by numerically integrating Eq. (5) using e a þ i x Ã i . The long-term behaviour of the bound is comparable, i.e. the over-estimate is a similar amount, for different sizes of network and error. The results for these examples are representative of other real-world networks. To illustrate this, in Fig. 4 we compare the errors computed from stochastic simulations (horizontal axis) with the corresponding errors computed using our approximation and bound (vertical axis) for 18 real-world networks, including the four used in Fig. 3. These networks constitute a standard benchmark test-set, including networks with heterogeneous topology on which mean-field approximations vary in accuracy 20 . The circular and triangular markers correspond to the approximation and bound, respectively. The SISa parameter values used are the same as in Fig. 3, i.e. α = 0.01, β = 2(γ−α)(N−1)/(zN) and γ = 1. The legend indicates which network has been used and these are ordered from the smallest simulation error at the top (furthest left in the figure) to the largest at the bottom (furthest right in the figure). References for each network, as well as information about size and mean degree, are included in Supplementary Note 3. Figure 4 shows that for a range of benchmark real-world networks our approximation gives a good estimate of the magnitude of the mean error and our bound is informative, i.e. these are correlated with the error (Pearson correlation coefficient: 0.62, p-value < 0.01 [without karate: 0.86, p-value ≪ 0.01]) and in all cases give a value <1.

Conclusion
In summary, we have presented a mathematical foundation for mean-field approximations of a wide class of dynamical processes on networks that facilitates the quantification of approximation error.
We have used approximate lumping to derive low-dimensional systems of equations directly from the exact master equation description, whose approximation error is minimal, in the sense that it is closest to an exact lumping, and can be quantified.
Our approximation results in a 'density dependent' system from which even lower dimensional ODE approximations can be rigorously derived in the large N limit [41][42][43] . Note that the lumped transition rates which we have derived only characterise network structure in terms of the mean degree, so do not account for variations in topology that may affect the dynamics. However, there is scope to extend our approach to more accurate degreebased mean-field 17 and high-accuracy approximate master equations 18,31 through more fined-grained lumpings by considering finer partitions of vertices and states 30 . There may also be alternative methods to bound the error 44 , potentially making use of theory developed for operator semi-groups 43 . While we extend our approach to quadratic VSTMs in Supplementary Note 5, further generalisations to arbitrary nonlinear VSTMs, e.g. via their power series expansions, may be possible. For non-smooth VSTMs, such as threshold models, consideration of the averaging process of the infinitesimal generator may also facilitate the derivation of approximations. The approach developed in this paper could also be applied to other complex systems, e.g. a natural generalisation is to multilayer network structures 45,46 via the supra-adjacency matrix representation. However, the details of the specific application are likely to be crucial and will inevitably influence the structure of the Markov chain state-space and hence how much our approach needs to be adapted to deal with these considerations. The COVID-19 pandemic has brought epidemic modelling into the spotlight and variants of compartmental models have influenced policy: for example, the UK's Scientific Advisory Group for Emergencies 47 at the time of writing list stochastic transmission models 48-51 as modelling inputs. Such models incorporate realistic features such as age structure and geography. However, the underlying contact network is difficult to obtain and we should consider the consequence of not accounting for this in our models. For example, Fig. 2a shows that mean-field approximations (which includes compartmental models) are a poor representation of the true dynamics. Thus varying infection rates to fit such models to data could distort their interpretation and hence the consequences of policy interventions.

Methods
Mathematical formulation. Let G = (V, E) denote a network with vertex set V and edge set E ⊂ V × V, where the number of vertices is N = |V|. We consider dynamical processes on finite connected simple networks (i.e. undirected, unweighted and with no self-loops) described by continuous-time Markov chains where each vertex can be in one of a finite number M of vertex-states and the set of possible vertex-states is W ¼ fW 1 ; W 2 ; ; W M g. We use caligraphic variables to indicate vertex-states. The state-space of the Markov chain is the set of all permutations of N vertex-states chosen from W with repetition. This is equivalent to Ω ¼ W V , i.e. the set of all functions from V to W, and so if the network is in state S ∈ Ω then the vertex-state of vertex v ∈ V is S(v). Since the number of states in Ω is M N , we can enumerate the states in state-space so that Ω ¼ fS ½1 ; S ½2 ; ; S ½M N g. We assume that the dynamics are governed by homogeneous SVT models, which includes models of spin systems, epidemics, opinion dynamics, diffusion of innovation and a variety of other social dynamics 28,52 . In a homogeneous SVT model, a vertex changes vertex-state at a rate that is a function of only the number of its neighbours in each vertex-state and the rate function is the same for all vertices. Furthermore, transitions only occur between pairs of states that differ in at most one vertex-state. We call such pairs of states transition pairs and use the notation S ½k $ v S ½l to indicate that the states S [k] and S [l] form a transition pair with Approximate lumping. To coarse-grain the network dynamics, we consider lumping of Markov chains 55 . An exact lumping Π = {Π 1 , Π 2 , …, Π r } is a partition of state-space that preserves the Markov property, a necessary and sufficient condition for which is that the sum of transition rates out of a state S [k] ∈ Π i into the cell Π j is the same for all states in the cell Π i . In matrix notation, this is equivalent to the existence of an r × r matrix q such that where C 2 f0; 1g M N r is the collector matrix 29 whose kjth component is We call Eq. (7) the lumpability condition. Note that q can be given explicitly by introducing the distributor matrix 29 D 2 R r M N , whose ilth component is Specifically, q = DQC satisfies the lumpability condition when Q commutes with CD 28 . A lumping that does not satisfy the lumpability condition, and hence does not preserve the Markov property, is an approximate lumping 29 . Recall that we consider approximate lumping partitions based on sets of states that have the same number of vertices in each vertex-state and use the generator q = DQC even when the lumpability condition is violated. Motivated by the condition for an exact lumping (7), for a given matrix norm || ⋅ || we define the approximate lumping discrepancy as ||QC−Cq||. Note that QC−Cq is a matrix of size M N × r, which in the case of an exact lumping has all zero entries, thus the approximate lumping discrepancy measures how far (in terms of the specific norm used) the approximate lumping is from being an exact lumping. For this reason, we choose q to minimise the approximate lumping discrepancy.
We now give an outline of the proof of Theorem 2.1, i.e. that q = DQC minimises the approximate lumping discrepancy using the Frobenius norm. With the Frobenious norm || ⋅ || F we have ½ðQCÞ kj À q ij 2 : Consequently k QC À Cqk 2 F can be minimised by choosing q ij to be the average of the sum of rates out of states in the ith level and into the jth level, i.e.
where N s ½i À Á is short for the multinomial Note that the q that minimises the approximate lumping discrepancy depends on the particular norm used; the Frobenius norm is advantageous because it results in an intuitive averaging process that is also analytically tractable.
Then for SVT models, the only possible nonzero rates are between pairs of lumped states that satisfy s ½j ¼ s ½i þ ν B À ν A , with A; B 2 W and A≠B, i.e. a vertex switches from vertex-state A to B. It follows that the lumped states can also be ordered so that q is a quasi-birth-death process and hence q is tridiagonal by blocks.
We now give an outline of the proof of Theorem 2.2 by illustrating how we derive the elements of q from the full Markov chain description. Consider the case where q ij corresponds to a vertex changing from vertex-state A to B, so s ½j ¼ s ½i þ ν B À ν A . In Eq. (8), for each state S [k] ∈ Π i we sum the rates into Π j to get (QC) kj . As assumed, these non-zero rates are associated with vertices in vertexstate A changing to B. Thus we can go through each vertex in S [k] that is in vertexstate A, count the number of its neighbours that are in each of the vertex-states to compute the transition rate (1), and sum these up. Equation (8) then averages these over all states in Π i . Our key insight is that rather than summing over states as Eq. (8) suggests, we can achieve the same total by summing over vertices and the possible states of neighbours.
For a vertex v with degree d v , the number of states in Π i where vertex v is in vertex-state A and has n = (n 1 , n 2 , …, n M ) neighbours in each of the vertex-states is where we have used our generalised multinomial notation, indicated by the presence of vectors in the denominators, e.g. d v n À Á ¼ d v n 1 ;n 2 ; ;n m . The transition rate of a vertex from vertex-state A to B is given by Eq. (1). To compute q ij we sum these rates over all N vertices and all possible values of n, and divide by the number of states to get where the sum over n|d v denotes a sum over all possible values of n such that n 1 + n 2 + ⋯ + n m = d v .
We deal with the δ A;B where we have assumed, without loss of generality, that the first index of the lumped state, s ½i 1 , corresponds to the vertex-state A. For the δ A;B m n m terms, again using the generalised Vandermonde identity, we have Substituting Eqs. (10) and (11)  Examples of binary-state dynamics include the SIS and voter models 28 and in Supplementary Note 2 we provide a classification of the different types of binarystate dynamics. Consequently, we suppose that the set of vertex states is W ¼ fS; I g and refer to vertex-state S as 'susceptible' and vertex-state I as 'infected'; an infection corresponds to a susceptible vertex becoming infected and a recovery corresponds to an infected vertex becoming susceptible. We can partition the statespace of binary-state dynamics into levels so that the ith level, Π i contains all states that have i infected vertices, for i = 0, 1, …, N, i.e. Π = {Π 0 , Π 1 , …, Π N }. It follows that the approximate lumping generator q is tridiagonal and QC−Cq is tridiagonal by blocks of column vectors of varying size. For 0 ≤ i < N, the column vectors of QC −Cq just above the diagonal correspond to infections and we denote these by Thus A Π i captures the difference between the sum of infection rates out of states in level i into level i + 1, and the mean q i,i+1 . Note that we use the subscript Π i to illustrate that the variable is a vector over the states in Π i . Similarly, for 0 < i ≤ N, the column vectors of QC−Cq just below the diagonal correspond to recoveries and we denote these by so B Π i captures the differences between the recovery rates out of level i into level i −1, and the mean. We then have where the zero entries indicate appropriately sized vectors of zeroes.
To simplify the error computation we assume that the initial distribution of the full Markov chain is stationary so that X(t) = X * , whose kth component is X Ã k . We also use X ÃT to denote the vector of stationary probabilities of states in Π i . Hence we find that The σ i contain information about the full system and therefore cannot be directly computed for typical systems of interest, i.e. when the size of the full state-space is beyond what can be stored in computer memory. We now consider the equilibrium solutions of Eqs. (2) and (4) in turn. For binary-state dynamics, our lumped approximation is a birth-death process, where a birth corresponds to an infection and a death corresponds to a recovery. Thus we can write where the rates λ i and μ i are finite and positive. The analytical expression for the stationary distribution x Ã ¼ ðx Ã 0 ; x Ã 1 ; ; x Ã N Þ T of such a birth-death process can be found in standard texts, e.g. Kijima 27 , but we reproduce it here in order to introduce notation that we will use when we derive the equilibrium of the error ODEs (4). The stationary distribution x* solves the recursion relation which has solution where ϕ 0 = 1, for i > 0 Similar to the lumped dynamics, the equilibrium of the error ODEs (4), y Ã ¼ ðy Ã 0 ; y Ã 1 ; ; y Ã N Þ T , satisfies the system of equations It follows that the solution solves the recursion Since both X * and x * are probability distributions, their elements sum to one and thus the sum of y Ã i is zero. Consequently for i > 0 we find where ψ 0 = 0, for i > 0 By substituting Eq. (13) into the definition of the mean error y Ã , given by Eq. (6), we find where ðj À x Ã Þϕ j and x Ã ¼ ∑ i¼0 ix Ã i is the stationary mean number of infected vertices. Thus we have split the calculation of y Ã into terms σ i , which depend on the full Markov chain (and hence must be approximated), and terms ρ i , which depend on the lumped system (and hence can be computed). Moreover, using the definition of x Ã and Φ, it is straightforward to prove that ρ i > 0 for all i, which suggests an intuitive bound on the absolute value of the stationary mean error given by Example: error approximation for the SISa model. We now consider results for the SISa model 32 , where the VSTM has infection rate f S;I ðn 1 ; n 2 Þ ¼ α þ βn 1 , recovery rate f I ;S ðn 1 ; n 2 Þ ¼ γ and f S;S ¼ f I;I ¼ 0. We derive bounds on the |σ i | terms for the SISa model, which with Eq. (15) allow us to bound j y Ã j. We also consider approximations of the σ i terms, which with Eq. (14) allow us to approximate y Ã . Using Eq. (8), for the SISa model we find for 1 fS ½k ðvÞ¼Sg ðvÞn ½k 1 ðvÞ: Note that n ½k SI is the number edges that connect susceptible vertices with infected vertices (hereon referred to as SI edges) in the state S [k] . Our formula for QC ð Þ k;iþ1 above for the SISa model follows from the fact that there are N−i susceptible vertices, and summing how many infected neighbours each has is equivalent to counting the number of SI edges. It follows that so the entry in A Π i corresponding to the state S [k] is proportional to the difference between the number of SI edges in state S [k] and the average of the number of SI edges in states in the ith level. A similar calculation shows that (QC) k,i−1 = γi and hence B Π i ¼ 0 for all i, i.e. the total recovery rate of a state in the SISa model is the same for all states in the same level. Thus for the SISa model Determining a þ i and the sum of probabilities in the ith level would allow us to bound the absolute value of the mean error, but this may be intractable in practice because it requires knowledge of the full Markov chain. Thus to obtain a bound on the stationary absolute mean error of the SISa model, we use an approximation for a þ i , denoted by e a þ i , and then assume that e a þ i x Ã i ≥ jA T i X Ã Π i j. In Supplementary Note 3 we show that while this assumption does not always hold, we typically obtain an informative bound regardless.
We now describe how we obtain e a þ i . Note that a þ i arises from the state in level i with either the largest or smallest number of SI edges. We refer to these states as the max and min SI states respectively. Finding the max SI states is equivalent to the Max-Cut problem, which is NP complete 33 . Finding the min SI states is also difficult because one needs to identify maximal cliques, which is also NP complete 56 . Because of this, we settle instead for estimates based on a greedy algorithm that starts from the state with all susceptible vertices and sequentially chooses a susceptible vertex to become infected that introduces the largest or smallest number of SI edges.
The algorithm is as follows. For binary-state dynamics in which vertices are either susceptible or infected, we iterate from level 0 to b N 2 c, picking a new vertex at each level to switch from susceptible to infected. There is only one state in level 0, in which all vertices are susceptible, so this is the state identified by the algorithm at the 0th level. Suppose that at the ith level the state S [k] is identified by the algorithm, then for each susceptible vertex v in S [k] , we compute the number of infected neighbours n ½k 1 ðvÞ and the number of susceptible neighbours n ½k 2 ðvÞ. We then pick the vertex with the largest difference n ½k 1 ðvÞ À n ½k 2 ðvÞ (which may be negative) to be infected, and this is the state that the algorithm identifies for the i + 1th level. If there are multiple such vertices then we pick the one with the lowest index. This last step ensures our algorithm is deterministic, although to destroy possible correlations between vertex degrees and their labels, it may be necessary initially to randomise the vertex labelling. In binary-state dynamics there is a symmetry about b N 2 c, by switching susceptible vertices to infected and infected to susceptible, which preserves the number of SI edges. We apply this symmetry to the states selected so far to determine the states in levels above b N 2 c. Clearly one could perform a more extensive search, but our goal is to have an algorithm that scales well with the number of vertices. A nearly identical process can be used to identify a state in each level with a low number of SI edges by selecting the vertex with the smallest difference n ½k 1 ðvÞ À n ½k 2 ðvÞ to become infected. For level i, we use e n þ i and e n À i to denote the maximum and minimum number of SI edges found by this algorithm, respectively. We also attempt to approximate σ i with a Ã i x Ã i , where This gives a measure of the skew of the distribution of the number of SI edges in each state in the same level.

Data availability
The networks and derived data are available in the Research Data Leeds Repository 57 .
Code availability