Dynamics of cascades on burstiness-controlled temporal networks

Burstiness, the tendency of interaction events to be heterogeneously distributed in time, is critical to information diffusion in physical and social systems. However, an analytical framework capturing the effect of burstiness on generic dynamics is lacking. Here we develop a master equation formalism to study cascades on temporal networks with burstiness modelled by renewal processes. Supported by numerical and data-driven simulations, we describe the interplay between heterogeneous temporal interactions and models of threshold-driven and epidemic spreading. We find that increasing interevent time variance can both accelerate and decelerate spreading for threshold models, but can only decelerate epidemic spreading. When accounting for the skewness of different interevent time distributions, spreading times collapse onto a universal curve. Our framework uncovers a deep yet subtle connection between generic diffusion mechanisms and underlying temporal network structures that impacts a broad class of networked phenomena, from spin interactions to epidemic contagion and language dynamics.

In this supplementary note we provide additional information to the master equation solution outlined in the main text. While the configuration space and its transitions was described in the main text, here we discuss a number of binary-state models to which our framework applies. Further, we derive the asymptotic size of configuration space, confirming this result numerically over a wide range of system parameters. Further, we provide lattice diagrams of configuration space and their interpretations. Further, we derive the edge transition rates µ j and ν j , along with the distribution of edge states E j .

Models of binary-state dynamics
In this section we outline how a number of well-known binary-state models may be straightforwardly extended to a temporal setting, and thus be solved using our master equation framework. These models are outlined in Supplementary Table I. Although we have not experimented on recovery models, where infected nodes reenter the uninfected state, it is an effortless extension to our framework. Importantly, the structure of configuration space is unchanged when solving for models that allow for node recovery. If infection involves transitions from set S k,m to I k,m at rate F k,m , recovery involves transitions in the opposite direction at rate R k,m .
Susceptible-Infected-Susceptible model. A straightforward extension to the SI epidemic model is to allow infected nodes to recover the uninfected, or susceptible state, thus defining the SIS model. Recovery takes place at rate δ here. In this way, recovery is independent of local edge dynamics k and m. Like the SI model, infection occurs at rate λ · m, where interaction events on infected edges each increment the transmission rate by λ, meaning bursts on infected edges increase the risk of infection.
Voter model. Here, a node in the uninfected state perceives influence totalling m · w from infected neighbours, and k · w from all neighbours combined. An uninfected node becomes infected with probability m·w/k ·w. This fraction must lie between zero and one, since even in the presence of arbitrary temporal fluctuations, m · w is clearly bound by k · w. As such the extension to temporal networks is natural. A node in the infected state recovers with probability 1 − m · w/k · w, meaning probabilities sum to one. A burst of activity on an infected edge increases the likelihood of infection, while a burst of activity on an uninfected edge lowers the likelihood of infection. This is in contrast with above models where k does not appear in the infection rate.
Language model. In this model, the two states represent the primary language choice of an individual, and the probability of transitioning between states is proportional to the fraction of influence in the neighbourhood, m · w/k · w, raised to the power of α, multiplied by the status parameter s or 1 − s of the respective language. s = 0.5 for the symmetric case of equal-status languages. Setting α = 1 then recovers the voter model. In a temporal setting, bursts of interactions between a pair of nodes increases the weighting of that speaker, and thus of the probability of retaining or switching languages, of course depending of the language of the interlocutor.

Lattice diagrams
The state space of configurations C over which our master equation framework applies is shown in Supplementary Figs. (1) and (2), for illustrative values of k and n. The space forms a regular lattice structure, with points in this space representing classes (k, m), and links between them the allowed transitions in a binary-state model of node dynamics. Points are labelled by their m vector, with the corresponding k vector on the bottom left of the lattice. The m = j m j is shown at the top left of the lattice.
A given class is directly connected to those that can be reached via neighbour and edge transitions. In the case of neighbour transitions, k is preserved, and m differs by ±e j . This corresponds to right diagonal transitions in the lattice diagram, black edges, which are directed in a non-recovery dynamics. The number of infected neigh- Supplementary Table I. Additional binary-state models. A selection of models, labelled on the left, to which our framework can be straightforwardly applied. Parameters introduced by each model are briefly described in the supplementary note. Our framework allows for recovery dynamics, which occurs at rate R k,m , with infection at rate F k,m . model F k,m R k,m SIS λ · m δ voter m · w/k · w 1 − m · w/k · w language s(m · w/k · w) α (1 − s)(1 − m · w/k · w) α bours m is incremented by one after these transitions. In the case of edge transitions, neighbouring (k, m) classes differ by ∆ ± j = −e j + e j±1 , meaning they lose an edge of type j, and gain an edge of type j +1 or j −1. These transitions are the red left-diagonal edges in Supplementary  Figs. (1) and (2), and are bidirectional. The total number of infected neighbours m is preserved during these transitions.
Supplementary Fig. 2 illustrates that the number of points in configuration space does not change when solving over temporal networks relative to edge heterogeneous static networks. For instance, a temporal model where edge state is bound by n = 1 has a C that is no different to that of a two layer multiplex, if the underlying degree distribution is the same. Crucially it is the number of transitions that differ, expressed in the W ego + W neigh + W edge formulation of the master equation.

Size of configuration space
The most important quantity in determining the feasibility of a master equation solution is the number of points in its configuration space C. It is this quantity that determines the size of the resultant system of differential equations, with each point associated to a rate equation for s k,m . In this section, we estimate analyt-ically the asymptotic growth rate of C, which we then validate using extensive numerical construction.
First, we note that configuration space is finite if and only if the degree distribution has some cutoff, which we call k here, and if the number of allowed edge states n is finite. If these conditions are satisfied, we can approximate the size of C analytically, as shown in Supplementary Fig. 3. We shall be interested in the case where n is fixed, and k is arbitrarily large. Our approach is to first calculate the growth rate of C k , the number of configurations with total degree k. This itself entails |C k |, the number of configurations with degree vector k. Once these quantities are known, it is straightforward to estimate the total size of a system whose degree distribution runs from 0 to a cutoff k.
The size of C k can be written exactly as where the sum runs over all degree vectors k with total degree k, and the summand represents |C k |, which amounts to the number of partial degree vectors m corresponding to a given k. To see this, note that C k itself is composed of all configurations where 0 ≤ m j ≤ k j is satisfied, for all j, for a given k. Then, the k j neighbours of a node that are in state j have k j + 1 possible inactive, active configurations, given by m j = 0, . . . , k j .
Supplementary Figure 2. Comparison of static and temporal configuration space. Allowed transitions in static networks, (a), and temporal networks, (b), for a k = 2 degree node with an n = 2 level edge-state set. In contrast to static networks, temporal networks allow what we call edge transitions, left-diagonals in (b). The static case may occur in a bimodal weighted network, or a two-layer multiplex network. See caption in Supplementary Fig. 1 for interpretation. Note that the number of configurations (k, m), or the number of rate equations in the master equation, is identical in each case.
Since each k j value is independent, the total number of configurations in C k is given by the product over the n edge types.
To obtain the growth rate of Supplementary Eq. (1), we estimate the value of two quantities. The first is the number of terms in the sum over k, and the second the expected value of its summand, C k . Since the sum is over all degree vectors k whose elements add to k, it can be enumerated by counting the number of ways of placing k objects into n containers. This is equivalent to choosing k from k + n − 1. Such a binomial grows as Θ(k n−1 ). Second, to determine the expected size of C k for a given k, it is clear that for any degree vector k, the number of corresponding m is bounded by (k + 1) n , since we clearly have k j ≤ k for all j, with equality holding only when n = 1. Although technically this only provides an upper bound, we propose that it is actually stricter, growing as Θ(k n ).
Given that the number of terms in Supplementary Eq. (1) grows like Θ(k n−1 ), and that the mean value of its summand arguably grows like Θ(k n ), we estimate that the number of configurations with total degree k grows like Although we have not included such a figure, this result can be validated in the manner of Supplementary  Fig. 9. That is, we construct the set C k numerically, iterating through all k satisfying k, and all m satisfying that k, before plotting its actual size alongside its estimated asymptotic value. In this way, we can confirm that C k indeed grows like k 2n−1 . Incidentally, this represents the complete configuration space for a k-regular network. Finally, we estimate the total number of points in C, assuming that the degree distribution runs from zero to the cutoff k. To do this, we observe that the number of k for each k increases monotonically, and does so gradually when k is large. As such, there will be k + 1 degrees to sum over, allowing us to write for the total configurations (k, m) in C. Although far from rigorous, our derivation appears to be validated by Supplementary Fig. 3, where we observe close agreement between the asymptotic term, and brute force enumeration of numerically constructed C. Note that as we display Supplementary Fig. 3 on a log-log scale, polynomials appear as straight lines whose slope is given by their leading order, 2n, and whose vertical intercept is given by the multiplicative constants associated with Θ notation. These coefficients are obtained by hand, choosing the intercept such that the dashed line fits the data, or solid line. A consequence of these results is that a precise master equation implementation of real systems is quite impractical. Real systems often include hubs, or nodes whose degrees reach into the tens of thousands. In the presence of edge heterogeneity, we have seen that such systems are prohibitively large. For example, a four-level multiplex with a maximum total degree of k = 10 4 is out of reach computationally, as it contains 10 27 terms, as per Supplementary Fig. 3. Nevertheless, a great deal of intuition can be built be studying the behaviour of more tractable systems, by imposing cutoffs on the maximum degree, and formulating the system in terms of low edge heterogeneity n, as we have done in this work.
Calculating µj and νj for renewal processes In this section, we calculate the rates of positive and negative edge transitions µ j and ν j in the stochastic temporal network model discussed in the previous section. Supplementary Figure 3. Asymptotic size of configuration space. Solid curves are |C|, the actual size of configuration space C, found by direct numerical enumeration. Dashed curves give the asymptotic size, Θ(k 2n ), show for n = 1, 2, 3 and 4. In this figure k represents the cutoff degree, with C assumed to be composed of nodes ranging from total degree 0 to k. The slopes of the dashed and solid lines are in excellent agreement.
Here, µ j dt gives the probability that at time t, for a renewal process having already produced j events in the preceding time window of duration η, a (j + 1)-th event is observed between time t and t + dt. Conversely, ν j dt gives the probability of an event exiting the η window. This is illustrated in Supplementary Fig. 4. At the outset, describing such a process with constant rates µ j and ν j seems inappropriate, as it is memoryless only in so-called event space. In this representation, a renewal process is nothing other than a sequence of trials, {τ 1 , τ 2 , . . .}, or the random sampling of a value τ from a distribution ψ(τ ). In this sense, the process is memoryless. However, in the resulting time-series for continuous t, the process is non-Markovian, as the time of the next event was decided at the time of the previous event, and the probability of an event occurring at a time between these points is zero. This is in contrast to a Poisson process, where the change in edge state due to the occurrence of an event is constant in time. Thus, on a microscopic level, where we observe a stochastic process on a single edge, a rate description is nonsensical. We note, however, that in our master equation formalism, classes (k, m) really represent ensembles of nodes, and although constant rates cannot be identified on a microscopic level, useful quantities do exist on a networkwide macroscopic level. That is, calculating the fraction of j-type edges that change state over an interval dt turns out to be strongly heterogeneous for varying j, which is clearly not the case for a Poisson process. The heterogeneity of the distribution ψ is reflected in the heterogeneity of µ, ν and E.
The rate of positive edge transition µ j dt is calculated by finding the probability of a (j + 1)-th event occurring over a given interval dt, on the condition that j events have already been produced in the preceding time interval of duration η. This is illustrated in Supplementary  Fig. 4(a), where we set t = η for convenience. In Supplementary Fig. 4, the interevent times τ 1 , . . . , τ j are drawn from the distribution ψ, and the times τ and τ from its complementary cumulative distribution Ψ, also known as the residual time distribution. It is defined as and gives the probability that the time between events is of duration at least τ . We introduce the domain T of times spanned by the configurations allowed in Supplementary Fig. 4, or the times t 1 < t 2 < ... < t j in an interval of duration η, such that t j − t 1 < η and t j+1 − t 1 < η.
Note that consecutive event times t j and t j+1 cannot coincide, with interevent times drawn from distributions ψ and Ψ, which defined over positive τ . We write where R j + is the j-dimensional space of positive real numbers. The probability of observing the configuration in Supplementary Fig. 4(a) is Ψ(τ )ψ(τ 1 ) . . . ψ(τ j ), which is the same as Ψ(t 1 )ψ(t 2 − t 1 ) . . . ψ(η − t j ) given that we've set the time t to η for simplicity. Similarly, the configuration in Supplementary Fig. 4(b) is observed with probability Ψ(τ )ψ(τ 1 ) . . . ψ(τ j−1 )Ψ(τ ), which is the same as Ψ(t 1 )ψ(t 2 − t 1 ) . . . ψ(t j − t j−1 )Ψ(η − t j ). The weighted sum of all such configurations yields the probability of observing a j type edge undergoing a transition to state j + 1 over an interval dt, and the probability of randomly selecting an edge in state j, as shown in Supplementary  Fig. 4(a) and (b), respectively. The only difference is the final term, which is drawn either form ψ or Ψ. With respect to the domain T , these sums can be written respectively. Here, ψ * j is the j-th convolution power of ψ, and is discussed at length in following sections. To obtain the rate µ j at which edges in state j transition to state j + 1, Supplementary Eq. (6) must be normalised by Supplementary Eq. (7), the probability E j that a randomly selected edge is in state j. Schematically, this corresponds to normalising the transition in Supplementary  Fig. 4(a) by those in Supplementary Fig. 4(b). Note that ν j dt, the probability that an edge in state j forgets an event over an interval dt is the same as µ j−1 under time reversal, up to the normalising constant. As such, the rates µ j and ν j , along with the distribution E j , can be written compactly as and with These quantities can be calculated either numerically or analytically, depending on the tractability of the chosen distribution ψ. In general, if ψ(τ ) is locally integrable, then the Laplace transform of ψ and Ψ exists and allows us to calculate the convolution as a product in the frequency domain, which will be useful especially if j is large. Since we don't impose any cutoffs on ψ and Ψ in the text, j indeed can grow arbitrarily large, under bursty dynamics. If we denote the Laplace transform of ψ(τ ) by then the transform of Supplementary Eq. (4) can be written asΨ Finally, we can argue that by induction from j = 0, and using the fact that the distribution E is constant at stationarity of the renewal process, that µ j E j = ν j+1 E j+1 , meaning that along with E j , the rates µ j and ν j are stationary. The observed experimental rates µ j , ν j and E j match exactly the predicted values, for increasingly large networks.

Illustration using the exponential distribution
In this section we explicitly calculate the edge transition rates µ j and ν j for an exponential interevent time distribution ψ. This is an exercise to illustrate the Laplace inversion procedure, in general we calculate these quantities numerically, as described in Supplementary Note 5. The exponential distribution is an important benchmark in this work, and we consider two alternative generalisations, namely the gamma and Weibull distributions. Both reduce to the exponential distribution when σ τ = τ = 1, and recover the rates given here. Consider  Figure 4. Enumeration of edge state configurations. Configuration of j events occurring over an interval of length η, where η amounts to memory of observed events. In plot (a) we enumerate all configurations of edges in state j at time t, with a (j + 1)-th event occurring over the interval t < tj+1 < t + dt. In plot (b) we enumerate all the configurations of edges in state j. We use this to calculate the rate of positive edge transition µj, and edge state distribution Ej, respectively. Interevent times τ1, . . . , τj are drawn from ψ, whereas τ and τ are drawn from Ψ, defined in the text.
then the exponential distribution with mean τ defined by having transformsψ respectively. Substituting these transforms into Supplementary Eq. (10), and applying the convolution theorem allows us to calculate the expected size of the set of edges in state j, which is also the normalising constant in the rates µ j and ν j , as the expression for L {E j } simplifies to toΨ We use the fact that L −1 { 1 s j+1 } = η j /j!, for integer j, a known Laplace transform relating to the gamma function. We use also the translation property L −1 {ψ(s + a)} = e −a ψ, which directly results from the definition Supplementary Eq. (11). The inverse Laplace transform of the above expression can then be written explicitly as meaning E is simply the Poisson distribution with mean η/ τ , up to a factor of τ . This is expected, due to our construction of the memory window, and the fact that an exponential interevent time distribution recovers a Poisson process. Similarly, the Laplace transform of the numerator in Supplementary Eqs. (8) and (9) can be used to calculate and yielding and after normalising by Ψ * ψ * (j−1) * Ψ. In this special case of exponentially distributed τ , we are able to derive these rates using much simpler arguments, namely with the definition of the Poisson process, and the Poisson distribution. Crucially, µ j has no j dependence here, which clearly expresses the memoryless property of the Poisson process. These rates may be verified by simulating an ensemble of independent, stationary renewal processes, and observing the flux in the system over an interval ∆t. The flow through the set E j over that interval, scaled by the size of that set and the size of the measurement window, give the rates µ j and ν j . Alternatively, by simulating a single renewal process for a sufficiently long time, and measuring its change in behaviour over each interval ∆t, one obtains rates µ j and ν j that are identical to those calculated in the ensemble.

Convolution powers, an aside
In order to calculate the distribution of edge states E j , as well as the mean field edge transition rates µ j and ν j , we need an efficient method for computing convolution powers. In general a convolution is defined for two real valued functions f and g over the domain f, g : (−∞, ∞) → R. However, in the case where f and g take non-negative values, as is the case in our study, the Figure 5. Random walk interpretation of edge state. Positive and negative edge transition rates, µj and νj, act as a signature of the non-Markovianity in the renewal process model. On a macroscopic level, this model is indistinguishable from a random walk as illustrated above, where the transition rates are provided by µj and νj by construction. In contrast, this equivalence is broken when taking into account node dynamics on the level of classes (k, m), as we shown in Supplementary Fig. 10. See also Supplementary  Fig. 6 for illustrative values of the rates µj and νj for varying στ .
convolution reduces to (23) We use this to express edge-state properties, Supplementary Eqs. (6) and (7), as convolutions over the nonnegative reals. It is worthwhile noting the convention that if ψ * j is the j-th convolution power, or then the zeroth order convolution, ψ * 0 = δ 0 , is simply the Dirac delta function, the identity of convolution. This is used in the j = 1 case of Eqs. 8, 9 and 10.

Supplementary Note 2. Random walk equivalence
In this Supplementary Note, we discuss the expected steady-state network topology that emerges due to our stochastic temporal model. In particular, we describe the existence of a pure Markovian system with identical macroscopic dynamics to our non-Markovian renewal process model. We consider the dependence of the edge transition rates µ j and ν j on our choice of interevent time distribution ψ, and in particular, its parameterisation in terms of standard deviation σ τ .
We assume an arbitrarily large network consisting of independent, stationary renewal processes, such that the time to the next event at any time t follows the residual distribution Ψ(τ ). We illustrate in Supplementary  Fig. 6 the edge transitions rates µ j and ν j that emerge from a gamma interevent time distribution ψ(τ ), for increasing values of standard deviation σ τ . Since µ j and ν j are heterogeneous, they can be interpreted as providing a signature of the non-Markovianity inherent to the renewal process microscopically. For instance, when σ τ = τ = 1, the gamma distribution reduces to an exponential distribution, corresponding to a Poisson pro- cess. That this process is memoryless is reflected in the edge transition rate µ j = 1/ τ , being homogeneous in j. The greater the departure from the Poisson process, the greater the heterogeneity in j, illustrated in Supplementary Fig. 6(a). A central result in this work is to note that for an uncorrelated ensemble of edges, the renewal process model is indistinguishable from a continuous-time Markov chain, namely, a one-dimensional biased random walk, as illustrated in Supplementary Fig. 5. In other words, the probability of a randomly selected edge undergoing a transition over an interval dt in the renewal process model is trivially identical to a Markov chain where by construction, transitions occur at rates µ j and ν j . Indeed, since all temporal network information is stored in µ j , ν j and E j in the master equation, any class of system producing a given set of µ j , ν j and E j values has the same predicted dynamics. Since in the special case of a Markov chain transition rates are exact at all scales (both locally on the scale of a single edge, and globally on the scale of an uncorrelated ensemble), the master equation solution is exact here. Since Supplementary Eqs. (8) and (9) represent a mean field approximation on the scale of classes (k, m) in the renewal process model, deviations in the master equation solution emerge here. These errors provide a measure of the extent to which non-Markovian dynamics can be captured by the heterogenity of a simple Markov chain.

Supplementary Note 3. Edge-state distribution
In this Supplementary Note we mention some basic properties of the edge state distribution. First, for all choices of ψ(τ ) and η in this work, a useful conserved quantity is the expected edge state, or the expected number of events per edge across the entire network. If τ is the mean of this distribution, and η is observer memory, the expected edge state is E = η/ τ , and is useful for monitoring the accuracy of the implementation. Further, note that if E is the set of underlying edges in the network, the superposition of |E| renewal processes converges to a exponential distribution with mean τ /|E|, for large E. As such, we expect |E|dt/ τ events per time window dt in simulation.

Temporal network percolation transition
The probability that a randomly selected edge is in state zero is given by ξ E . One can use ξ E to determine whether active edges in the network, the set of all edges in state one or higher, are expected to form a giant component at any given time. If a giant component exists, we say that percolation has taken place. If percolation does not occur, nodes form finite active clusters whose size goes to zero in the limit of large networks. This is true even if the underlying network itself consists of a giant component. The percolation transition helps to explain sharp transitions in diffusion dynamics, separating regimes of fast and slow information diffusion. The percolation condition for a configuration model network with degree distribution q(k) is given by Heat map comparison of spreading models. Spreading time t f for the relative and absolute threshold models, (a) and (b) respectively, and SI model, (c). Interevent time distribution is Weibull, and the underlying network is lognormal with mean k = 7 and standard deviation σ k = 0.5. The temporal network phase transition is in evidence in the top right of each plot. where is the degree distribution obtained by randomly removing a fraction ξ E of edges from a network with degree distribution p(k), which is effectively the network induced when removing edges in state zero in out temporal model. The percolation transition is responsible for a corresponding transition in diffusion dynamics, echoed in Supplementary Fig. 7.

Maximum edges state
In our analytic solution, we introduce n to denote the maximum edge state j. In experiment, no such restrictions are imposed when sampling the interevent time distributions ψ and Ψ, in contrast to related work [1] where it is common to introduces upper lower bounds on τ . As a consequence, bursts in activity can lead to arbitrarily large edge states j. However, as we see in the structure of configuration space, if n is the maximum edge state allowed in the system, the number of equations grows like Θ(k 2n ). Clearly, in the interest of the numerical implementation of the master equation solution, n cannot be arbitrarily large.
It is noteworthy that despite edge state being unrestricted in simulation, the resultant diffusion dynamics can be accurately solved using relatively small values of n. Consider that large bursts are most common when the interevent time standard deviation σ τ is large. In this regime, the fraction of edges in state j = 0 is significant. As a consequence, nodes observing bursts of activity on some edges frequently observe no activity on others. Indeed for large enough σ τ , it is rare for a node to observe more than on active neighbour, with that active neighbours generally being in a very large state j. For the relative threshold (RT) model, such a node is infected with high probability if the neighbouring node is active, since the threshold φ is guaranteed to be overcome here. Similarly, for the absolute threshold (AT) model, any infected neighbour in state j = M φ or higher is likely to adopt. Finally, for the susceptible-infected (SI) model, consider that the duration of a spike in activity is on the order of η, i.e., the duration of observer memory. Since the infection rate increases proportionally to the size of the burst, a large burst leads to infection soon after its observation, and long before they have left the time window. Since the local neighbourhood of the newly infected node is likely sparse in this setting, a much smaller burst would lead to an identical diffusion outcome. Effects such as these mean that surprisingly low value of n are sufficient to accurately model the diffusion process.

Supplementary Note 4. Monte Carlo simulation
In this Supplementary Note we discuss the Monte Carlo simulation methods used in this work. Since node dynamics do not feed back into edge dynamics, we assume a steady state renewal process is in place at t = 0. This involves initialising the system with one τ drawn from the tail distribution Ψ. We start the simulation at t = −η, so that at time t = 0, the system is at steady state.

Parallel event sequences
Our model of node dynamics is Markovian, since an uninfected node v becomes infected at a constant rate F v . In the case of threshold models, the transmission rate is a step function, whose upper value is set conventionally to one. A straightforward Monte Carlo simulation in this case is to advance in time with fixed intervals ∆t = 1 N , where N is network size, and randomly selecting a node v to trigger with probability F v at each step. This approach is unsuitable when transmission rates are unbounded, since the time step ∆t cannot be rescaled a priori to preserve the random node selection approach. This is the case for the SI rule in our model. We define the transmission rate here to be proportional to edge state F k,m = max(p, m · β), where λ = λw, for a node in class (k, m). Since m and λ are unbounded in our simulations, as we impose no restriction on ψ and Ψ, bursts of activity due to our renewal process model can result in arbitrarily high F .
For this reason, we prefer an event-based Gillespie algorithm, which is equivalent, but advances in time by jumping to the next event, rather than by uniform increments ∆t. In the remainder of this section we discuss the implementation of sequences of these events. end while 5: end procedure For a graph G(V, E), we define two types of events, node events, defined over the node set V, and edge events, defined over the edge set E. A node event is implemented as the tuple {t v , v}, and edge events {t e , e uv , s}, respectively. Node events amount to the infection of node v, at a time t v , and edge events the positive or negative change in the state of edge e uv , depending on the sign of the indicator s = ±1, according to our stochastic temporal network model. Monte Carlo simulation is implemented as two time ordered, dynamic sequences of events, Ξ v and Ξ e , for node and edge event types, respectively. While Ξ e is independent of node dynamics, both node end edge events feed back and cause a potential reordering of Ξ v , as explained below. The node event sequence is initially of size N = |V|, since we have one event for each node in the network, with the leading event being removed when all preceding edge activity has been carried out, i.e., when t v < t e , for the leading terms in each sequence. The action carried out is best illustrated by considering a static network, Algorithm 1. Here, the apply head instruction means to trigger the node in question, remove it from Ξ v , and update the transmission rates of its neighbours, and in turn, their position in Ξ v . In our model, the edge event sequences has approximately constant size, apart from some fluctuations due to the fact that η/ τ is only the expected edge state, the absolute number of events in η-memory can go up and down. As networks increase in size, the size of the edge event sequence Ξ e converges to (2 + η/ τ ) |E|. Assuming that a non-zero level of noise is present, p > 0, then the Ξ v will eventually be emptied. t v ← dequeue Ξ v 10: end while 11: end procedure Algorithm 2 is the temporal extension of Algorithm 1. In each, the time of infection of every node v is determined at t = 0. This is done by drawing from an exponential distribution with mean F v , i.e., taking the natural logarithm of a uniform random variable on (0, 1), divided by −F v . A node event is permanently erased from the se-quence if t v < t e , as mentioned. At this time, neighbours u of v have their transmission rates F u recalculated, as in Algorithm 1, as they now have an additional infected neighbour. Additionally, the infection time of a node is recalculated if a change in its local neighbourhood takes place, such as activity on an adjacent edge, or the infection of a neighbouring node. Since event sequences are time ordered, a node event is first erased from its position in the sequence, and then reinserted when such a calculation takes place. No such reordering take place for the edge event sequence, as once an event is inserted here, it is only removed when its time t e is at the front of the queue. While t e < t v , where the events in question are the leading events of each queue, there are two possible actions for the apply head instruction for Ξ e in Algorithm 2. The first, if s = +1, the leading edge event is erased and replaced with two new events on the same edge e uv , occurring at time t e + τ , where τ is drawn from the interevent time distribution ψ. At the same time, an event is inserted for t e + τ + η, with s = −1, corresponding to the decrementing of that same edge η time steps later. If an s = −1 edge event is leading the sequence, and we still have t e < t v , then the event is removed from the sequence without replacement.
In the case of static networks, Gillespie algorithms allow massive speedup relative to the random selection approach. This is not the case for the node dynamics in our temporal network model, where a Gillespie event sequence affords very little speedup. Here, the edge activity sequence is a bottleneck, since it still requires |E|dt/ τ edge updates per dt on average. As such, it is not for the benefits to speed that we use a Gillespie type event sequence for node updates, it is to account for arbitrarily large spikes in node transmission rates, particularly under the SI rule for temporal networks.

Inverse transform sampling of ψ and Ψ
In this section we discuss the numerical pipeline used to simulate renewal processes, a procedure that amounts to accurately sampling from the interevent time distributions ψ(τ ) and Ψ(τ ). Due to the large values of the standard deviation σ τ to be examined in this work, currently available software could not be used to sample values of τ . While the excellent <random> library for C++ allows rapid sampling from the lognormal, Weibull and gamma distributions, it appears to become inefficient for large σ τ , especially for the gamma distribution. In any case, directly sampling from Ψ, which often involves special functions, is beyond the scope of this library. Because of this, we build our own sampling routine.
We favour an inverse transform sampling technique, where random values are sampled on an interval (0, 1), and evaluating the inverse of a cdf at this point provides a sample of the underlying pdf. That is, for a random variable x ∈ (0, 1), evaluating Ψ −1 (x) provides a sample τ value from ψ. Further, random sampling of the residual  Figure 8. Sampling probability distributions. Numerical construction of Ψ, a probability density function that can be accurately evaluated at any point τ , at some cost, for all distributions used in this work. This grid results from iterating outwards from τ = τ = 1, for increasing τ until Ψ < , blue grid, and for decreasing τ until Ψ > 1 − , red grid. Inverse transform sampling is then performed on the interval ( , 1 − ) to provide samples of ψ, using a bisection method on the grid, to find the corresponding τ . A third order spline interpolation on a logarithmic scale provides intermediate values of τ . Additionally, the grid can be cumulatively summed, and inverse transform sampling carried out on the spline of the resulting grid, to provide samples of Ψ.
distribution requires finding the cdf of Ψ, and being able to approximate its inverse.
First, the problem of sampling from ψ is that while Ψ is known, its inverse generally is not. Second, we can't easily sample from Ψ since its cdf is generally unknown to begin with. In fact, we often have enough difficulty simply evaluating Ψ, as is the case when ψ is the gamma distribution, where Ψ has no closed form. As a result, we can only determine Ψ −1 approximately. We do this by first generating a grid of Ψ values as shown in Supplementary Fig. 8. Due to the large σ τ values of interest to us, it is difficult to estimate a priori the desired upper and lower limits τ of the grid, which vary significantly depending on the choice of ψ. Said differently, we want to be as small as possible, in order to allow for the sampling of extreme values of τ , which are critical to the dynamics of our system. Further, it goes without saying that taking the small limit ensures that τ and σ τ are respected, which is crucial since the latter is a control parameter in our experiment. As shown in Supplementary Fig. 8, we iterate outward from τ = τ = 1 uniformly on a logarithmic scale until a desired interval ( , 1 − ) is obtained. An arbitrary precision library is required to accurately determine Ψ, and we prefer GNU MPFR, a multiple-precision binary floating-point library with correct rounding. See Supplementary Note 8 for details regarding the distributions themselves.
With such a grid accurately computed, we carry out third-order spline interpolation on a logarithmic scale to allow rapid evaluation at arbitrary points τ within the grid. As a first application, the spline can be used to rapidly solve Ψ(τ ) = x using a bisection technique. This provides samples τ of ψ. The grid can then be cumu-latively summed to provide the cdf of Ψ over the same domain. Performing approximate inverse transform sampling on the resulting grid returns samples τ of Ψ.
It is worthwhile mentioning that rejection sampling techniques appear to be out of the question here. This is due to the extreme bounding values of τ necessary for the standard deviations σ τ studied in our choice of heavytailed distribution. Even for clever choices of enveloping functions for ψ and Ψ, it is likely that the acceptance rates will be prohibitively low. Finding such functions remains an interesting challenge, but since our bisection approach converges exponentially quickly, it is hard to imagine that a choice of envelope exists that makes rejection sampling faster.
Finally, we comment on experiments involving very large standard deviations σ τ . Here, not even large networks running for a long time provide an unbiased sample of ψ. Following the central limit theorem, the standard deviation of the sampled mean interevent time equals σ τ / √ n s , if n s is the number of samples in question, which for the sake of argument is on the order of the number of edges in the network. High quality experimental results can be obtained by simulating large networks, or by averaging over realisations. Note further that large σ τ experiments in our model happen to coincide with very long simulation times. As a result, even for large σ τ there is little noise in our results due to the substantial runtime. Finally, ξ E plays a very important role, and amounts to the fraction of samples of Ψ that are larger than η. A consequence of this is that for large σ τ , most edges don't even participate, having drawn residual times at t = 0 that are longer than the duration of the experiment, determined by ρ = 1 − e −pt .

Supplementary Note 5. Laplace transform inversion
In the following sections we describe the numerical pipeline for obtaining the rates µ j and ν j , as well as the edge-state distribution E j . Although these values could be calculated by hand for the case of the exponential distribution, and maybe even the gamma distribution as its Laplace transform is known, we would like to be able to do this numerically for arbitrary ψ(τ ), including those for which the Laplace transform of ψ and Ψ are not known.
These rates are expressed in terms of convolutions. Since we are interested in j-th order convolutions, for arbitrary positive integers j, we prefer to perform products in frequency space, as permitted by the Laplace transform. A j-th order convolution for large j increases exponentially in complexity, and can only be performed directly for very small j. Further, E j can be broad when σ τ is large. The Laplace transform is defined aŝ where f is a real-valued function of time t, andf a com-plex valued function of the complex variable s = σ + iω.
For the Gaver-Stehfest algorithm in the following section, s is always real, so we can set ω = 0 and writê (28) This is necessary when the Laplace transform of f is not know, and must be evaluated numerically, as is the case for the lognormal and Weibull distributions, used throughout this work. There exists a useful framework that unify these different algorithms [2], that express each approach in terms of a weighted sum off values. We shall see that the computation of the forward Laplace transform is the bottleneck, as opposed to finding the weights. As such, the preferred method depends on how many numerical inversions of f are required. This is 2M in the Gaver-Stehfest algorithm, 2M + 1 in the Euler algorithm and M in the Talbot algorithm, with M controlling the desired accuracy (see [2] for details). However, only the real integral need be found in the Gaver-Stehfest algorithm, which we prefer for this reason.

Gaver-Stehfest algorithm
We assume a Laplace transformf is known, and that we wish to recover the origin unknown function f . In our case, f corresponds to convolutions of ψ and Ψ. For any t > 0 and positive integer M , so-called Salzer summation yields the Gaver-Stehfest inversion formula, where the weights ζ k are given by (30) Conveniently, the weights ζ k are independent of the transform being inverted. This means that after a precision M is chosen, weights can be stored in a vector ζ of dimension 2M to be reused for a number of transformŝ f . We discuss the numerical implementation further in following sections, however mention here that we use the GNU Multiple Precision arithmetic library GMP for ζ k , in particular its integer summand, and the related MPFR library for floating point arithmetic for manipulatingf . Regarding the weights ζ k , a useful property for benchmarking is the fact that for all M ≥ 1, due to the (−1) M +k factor in the definition of ζ k , these weights oscillate around zero. The Gaver-Stehfest algorithm requires a system precision of approximately 2.2M tracked bits, which we input to the constructors of variables in MPFR.

Numerical pipeline
We now describe specifically how the inversion procedure in the previous section can be used to determine the values E j , µ j and ν j described in preceding sections. For simplicity, we refer to these quantities in this section using vectors E, µ and ν, whose dimension is determined by the size of the edge state space n in our master equation, which need not be determined a priori. Since E 0 and µ 0 are special cases of E j and µ j , we write E = (E 1 , E 2 , . . . , E n ) T for the edge state distribution, and µ = (µ 1 , µ 2 , . . . , µ n ) T and ν = (ν 1 , ν 2 , . . . , ν n ) T for positive and negative edge transitions. The function f (t, M ) in the Gaver-Stehfest algorithm corresponds to E j (η, M ), µ j (η, M ) and ν j (η, M ), where M as before is a parameter of the Gaver-Stehfest algorithm that tunes the accuracy of the approximation.
Calculating the distribution E, and the rate vectors µ and ν amounts to three separate matrix vector products. We require the vector ζ, of dimension 2M , as per the Gaver-Stehfest algorithm whose k-th element is given by Supplementary Eq. (30), and the Laplace transforms of ψ and Ψ evaluated at s = k ln(2)t −1 , denoted ψ k and Ψ k , as per Supplementary Eq. (12). We define n as the maximum edge state, which can be as large as one likes here, it is limited only by M , which must be increased for increasing n. These quantities are then used to populate the n×2M dimensional matricesÊ,μ andν, whose jk-th elements are given by with 1 ≤ j ≤ n and 1 ≤ k ≤ 2M . Determining the distribution of edge states E, and the edge transition rates µ and ν, then amounts to the matrix-vector product ofÊ, µ andν, respectively, with ζ, such that E =Êζ, µ =μζ, and ν =νζ. These quantities are calculated once at the start of the Runge-Kutta solution of the corresponding system, and don't change otherwise. Note that results have to be scaled by ln(2)t −1 as per the definition of f (t, M ) in the Gaver-Stehfest algorithm, and µ j and ν j normalised by E j , to adhere to their definitions. Examples of µ and ν values are provided in Supplementary  Fig. 6 for the case of a gamma distribution ψ. Implementation of the Gaver-Stehfest algorithm in C++ created confusion for some time, not because of the calculation of the weights ζ k , but in its product withf . The authors of [2] state that while it is clearly required for the weights ζ k , arbitrary precision is not required for handlingf . However, we find that double precision is not sufficient in general forf , which can be confirmed by using double values in C++, or equivalently, setting a precision of 53 bits in a arbitrary precision library. We use a combination of the GMP library for the binomial coefficients, factorial, and exponential in Supplementary Eq. (30). Division is performed after conversion to MPFR. Similarly, f in Supplementary Eq. (29) is determined using MPFR, whether the Laplace transform is known in closed form, or if it is calculated from its integral definition.

Supplementary Note 6. Static network topology
In this supplementary note we further consider the effect of network structure on the temporal dynamics outlined in our framework. Our master equation assumes a configuration-model network, whereby network structure is maximally random apart from the degree distribution, which is fixed. We have shown using the relative threshold model that degree distribution, beyond its mean, has little impact on diffusion dynamics in the presence of uncorrelated temporal fluctuations. This is evidenced by the degeneracy of ρ f values obtained when varying the degree standard deviation in a lognormal degree distribution, while keeping the mean degree fixed. In the following we identify features of network topology that are likely to produce such degeneracy, and those under which the degeneracy may be broken.
We examine the effect of clustering on the dynamics of information cascades [ Supplementary Fig. 9(a)]. We do this using an extension to the configuration model due to Newman, where in addition to specifying the number of edges, we specify the number of triangles to which a node is adjacent. In this figure, we interpolate between an 8-regular random network where no triangles are present, and one where all edges form part of a triangle. To do this, we assign a total of 8 stubs to a node, before arbitrarily sorting them into pairs. We assign with probability α the triangle type to a given pair of stubs, constraining them to be wired to another two pairs of triangular stubs, belonging to two other randomly selected nodes. With probability 1 − α, each stub in the pair is assigned the single type. In this way, when α = 1, clustering is maximal, and when α = 0, clustering is vanishing in the large network limit. For the following, and all experiments in Supplementary Fig. 9, we vary interevent time standard deviation σ τ of a Weibull interevent time, while fixing its mean to τ = 1. Memory is set to η = 1, and node dynamics follow the relative threshold rule with φ = 0.15, in the presence of background noise at rate p = 2 × 10 −4 .
Like degree distribution, clustering appears to have no impact on information diffusion beyond the Poisson regime, σ τ = 1, where interaction times are random. While burstiness has a decelerative effect beyond this point, it is indifferent to the extent of clustering. This is Same degree distribution as (b), but maximally random otherwise. Varying degree distribution has no effect on spreading process, only z matters. Local structure is washed away by temporal fluctuations, but global structure, like average path length, remains critical as evidenced by varying β in the Watts-Strogatz model. Nodes follow the relative threshold model with φ = 0.15. Interaction memory is η = 1. Data is from numerical experiment only, on networks of size 10 5 , with 10 2 averages in each case.
in contrast to the when the network is effectively static, such as when σ τ = 10 −4 , and clustering slows diffusion by a factor of more than 2. The degeneracy in the bursty regime can be understood by accounting for the sparsifying effects of burstiness. That is, at any given time, a triangle is expected to be broken due to one or more edges not recalling an interaction event. As such, active edges produce a subgraph that is locally tree-like with high probability. One way in which this effect can be avoided is if edge activity is correlated, namely, when activity on one edge of a triangle increases the likelihood of activity on the remaining edges. Since burstiness is likely causal in reality, this suggests a promising future research direction. Since both degree distribution and clustering have proven to be degenerate with respect to uncorrelated in-teraction events, we depart from the configuration model altogether. By way of contrast, we consider the Watts-Strogatz model, which has the advantage of simultaneously varying local structure, namely clustering and node degree, as well as global structure, in the average path length. By global structure, we mean quantities that only meaningful when taking all nodes in the network collectively. Following typical implementations, we vary the probability β of randomly rewiring edges in an 8-regular ring lattice, thus producing shortcuts to distant nodes in the network. In this way, we interpolate between an entirely regular structure when β = 0, and a structure close, but not identical to, a maximally random graph when β = 1. Experimental conditions are identical to those described in Supplementary Fig. 9(a).
We observe a substantial dependence on the rewiring probability β in the Watts-Strogatz model, as evidenced by the non-degeneracy of the curves in Supplementary  Fig. 9(b). Small values of β, such as 0.02 and 0.04, are sufficient to substantially reduce the spreading time, at all but the very largest values of σ τ . Since small β leaves clustering and degree distribution relatively unchanged, we attribute this accelerative effect to the average path length, which is significantly reduced at small β. In any case, we already expect changes in clustering and degree distribution to have negligible effect on casacade dynamics in the presence of temporal fluctuations, as evidenced by previous experiments. To appreciate the magnitude of the small-world effect, consider that when σ τ = 1, for example, burstiness must be increased σ τ = 30 to achieve the same effect as reducing β from 1 to 0. In other words, the decelerative effect of the small world topology is as large as that achieved by increasing burstiness by an order of magnitude. We can understand the non-degeneracy here be noting that unlike clustering, average path length is a feature that cannot be undone by temporal fluctuations, except at the largest values when ρ f approaches 1.
We repeat the experiment performed in Supplementary Fig. 9(b), only now on the corresponding configuration model network, plotting the results in Supplementary Fig. 9(c). The network is initially constructed as before, for identical values of β. Then, we sever all edges and rewire them at random, thus preserving the degree distribution while removing collective structure such as average path length. In this way, degree distribution is shown once again to be degenerate, confirming that the collective structure of the Watts-Strogatz model was responsible for breaking the degeneracy in Supplementary  Fig. 9(b).
The above experimental results lead us to conjecture that network features that are essentially collective are most likely to continue to play a role in the presence of temporal fluctuations. This is in contrast to features determined on the level of nodes and their immediate neighbours, such as node degree and clustering. Properties such as average path length, community structure and eigenvector centrality are all examples of manifestly col-lective structures. In the absence of collective structure, our results in this section highlight the need to explicitly introduced temporal correlations, if properties such as node degree and clustering are to play a role in information diffusion.

Supplementary Note 7. Mean field approximation
In this Supplementary Note we examine the effectiveness of the mean field assumption of edge state transitions. The edge transition rates calculated in this work assume an infinitely large ensemble of stationary, uncorrelated renewal processes. We expect a carefully implemented stochastic temporal network to agree precisely with theory, in the limit of large networks, which is what we confirm. In particular, by viewing the network as a single ensemble of edges, flux measurements indeed show theoretical results for E j , µ j and ν j to be exact on a network wide scale. However, our analysis involves partitioning our network into 2 × |C| classes, namely, an infected and uninfected variant for each of the |C| classes (k, m) allowed by the system. Clearly, the densities of nodes over these classes, and the transitions between them are highly dynamic, emptying and filling over the course of a spreading process. Further, transition rates are heterogeneous, particularly with respect to the transmission rate F k,m which varies from class to class. The mean field approximation is to assume that edge transition rates for individual classes are the same as for a completely uncorrelated ensemble.
The reason that we expect differences in edge transition rates over these scales is as follows. The emergent rates µ j and ν j result from certain assumptions regarding edge statistics at a microscopic level. This is formulated in terms of the history distribution, or distribution of tuples (τ 1 , τ 2 , . . . , τ j ) within each η window, on each edge across a given set. This is illustrated in Supplementary  Fig. 4. Expressed in these terms, the mean field approach is to assume that the history distribution within a particular class (k, m) is completely uncorrelated, and is equivalent to any randomly sampled subset of edges, or indeed the network edge set as a whole. To see how this assumption may break down, consider the flow of uninfected nodes through C as a survival process, whereby a node enters an uninfected class (k, m), only exiting and reentering an adjacent uninfected class if it does not become infected in the meantime. To further simplify this picture, consider (k, m) to have F k,m = 1, with neighbouring uninfected classes having F k,m = 0, as may occur with a threshold model of infection. In such a survival process, it is nodes with a history distribution (τ 1 , . . . , τ j ) with short intervals in (k, m), that are favourable to survival, exiting to adjacent uninfected classes. History distributions leading to long waiting times in (k, m) are more likely to become infected, with survival times following an exponential distribution with mean F k,m . As such, the history distribution of nodes exiting the class via infections are different to those that survive, and exit via edge or neighbour transitions. Mechanisms like this, and the related effect due to neighbour transitions, gradually transform the history distribution of individual classes, resulting in deviations in the emergent edge transition rates µ and ν. Although the assumption is broken on the scale of individual classes, it is of course preserved when taking all classes together. We confirm this effect in an experiment whose results are shown in Supplementary Fig. 10. Here, we simulate a spreading process, and carefully record the densities of nodes in each class (k, m) at all times t, as well the flows between classes over measurement windows of length ∆t = 0.02. To allow ∆t to be as small as possible, and approximate the dt in our analytics, we constrain the system to the smallest non-trivial configuration space C. To this end, the degree distribution is 3-regular random, and ψ given by a power law with α = 2.5, with lower and upper cutoffs of τ = 1.01 and 50, respectively. By choosing η = 1 < 1.01, we ensure a two-level edge state space, where edges are in state j = 0 or 1. The resulting configuration space has size |C 3 | = 20, is connected, and resembles a smaller version of Supplementary Fig. 1. By setting network size as large as possible, here N = 10 8 , the node set is diluted as little as possible over C. We plot the flux measurements of the uninfected class with degree vectors k = (2, 1) T and m = (1, 0) T . Node dynamics follow a relative threshold rule with φ = 0.4, and background noise causing infection at a rate p = 2×10 −4 . As such, the transmission rate for the class in question is F k,m = p.
Ego transition measurements are shown in Supplementary Fig. 10(a). Since the class initially has density s k,m = 0, with the network initialised to ρ = 0 at t = 0, initial measurements show only a handful of ego transitions per ∆t up to around t = 10, visible here thanks to the logscale. As expected, the rate of ego transition closely agrees with the measured value of p = 2 × 10 −4 , verified by scaling the total set density by F k,m s k,m ∆t, given by the solid black curve. This transition is guaranteed to agree with theory if the Gillespie algorithm is correctly implemented, and serves as a useful benchmark in flux measurement experiments. Further, neighbour transition rates are verified in Supplementary Fig. 10(b) and (c), for edges of type j = 0 and 1 respectively. Time-dependent rates β j are calculated using the set of |C 3 | = 20 empirical densities s k,m . Scaling s k,m for the class in question by β 0 (k 0 − m 0 ) and β 1 (k 1 − m 1 ) shows remarkable agreement with measured fluctuations.
Flux measurements of positive edge transitions for uninfected and infected neighbours are shown in Supplementary Fig. 10(d) and (e), respectively. While agreement is excellent in (d), a clear deviation emerges in (e).
Although the disagreement appears minor on a logarithmic scale, the theoretical µ 0 is off by roughly 20%. The rate of negative edge transition in (f) is in good agreement with theory. A study of the complete state space shows that deviations as in (e) are common, but not systematic, with mean field estimates µ and ν sometimes overestimating, and sometimes underestimating the measured fluxes. The master equation solution of ρ for this experiment, not shown here, miscalculated the overall spreading speed by about 15%. It is likely that the small configuration space here contributed to the error. In much larger systems, this effect is likely diluted, especially since µ and ν do not systematically over or underestimate the class-level edge transition rates. That is, a cancellation effect might emerge.
We confirm that the mean field assumption breaks down due to heterogeneities in transmission rates by studying a purely noise driven variant of the above experiments. That is, F k,m = p for all classes (k, m). The  Figure 11. Higher order moments of probability distributions. Comparing skewness and entropy of probability distributions as a function of standard deviation. Mean is fixed to µ = 1. (a) The third raw moment E[X 3 ], and skewness γ in (b), for the lognormal, Weibull and gamma distributions. The third raw moment is the only term that differs in the calculation of skewness. (c) Entropy follows the same trend as (a) and (b). We are interested in the connection between these quantities and ξE in (d), the fraction of edges in state j = 0, here with η = 1. same flux measurements, not shown here, are in perfect agreement with theory in such a setting. Further, the relative set sizes for constant m rows of configuration space are exactly what one would expect given a degree distribution p(k), and a probability distribution E j that a randomly selected edge is in state j.

Supplementary Note 8. Skewness and entropy
In this Supplementary Note we examine the skewness and differential entropy of the interevent time distributions used in this work, namely the lognormal, Weibull and gamma distributions. Our goal is to understand how these distributions differ after carefully controlling for their mean and standard deviation, µ and σ respectively, and to this end propose skewness and entropy as measures that can be understood as ranking various distributions ψ by the value of their effective sparsity ξ E .
We control for both the mean and standard deviation when comparing distributions in this work. For a random variable X, this is equivalent to controlling the first and second raw moments, E[X] = µ and E[X 2 ] = µ 2 +σ 2 . We wish to examine the differences that remain between our chosen distributions after imposing these constraints. To this end, it is logical to examine the third raw moment E[X 3 ]. This is usually done by calculating the skewness γ, typically defined as the third standardised moment, or E[(X −µ) 3 ] normalised by σ 3 . The normalisation renders the moment scale invariant, meaning the information encoded in the skewness relates only to the "shape" of the distribution in question, and not its variance. Since we're interested in comparing skewness for constant mean and standard deviation, we use the expression Clearly, skewness γ differs from one distribution to another only in the third raw moment E[X 3 ], since we keep µ and σ constant. Plotting skewness γ as a function of σ for constant µ = 1 leads to the plot in Supplementary   Fig. 11, for the lognormal, Weibull and gamma distributions. The large differences in skewness here correspond to observations regarding effective sparsity ξ E in earlier parts of this work. As such, the skewness of a distribution provides a good rule-of-thumb for comparing ξ E for different distributions.
In addition to skewness, we calculate the differential, or information entropy for each distribution used in this work. Differential entropy is defined as for distributions defined for x ∈ (0, ∞), as is the case for the lognormal, Weibull and gamma distribution. The motivation for considering the differential entropy is the observation that it was the lognormal distribution, the maximum entropy distribution for which the mean and variance of ln(X) are specified, that produced the fastest diffusion times in our framework. As shown in Supplementary Fig. 11, the relative values of skewness and entropy agree qualitatively with expectation. Simply put, the relative values of skewness γ, shown in plot (b), and differential entropy h, shown in plot (c), agree qualitatively with the relative values of effective sparsity ξ E , shown in plot (d). This supports the rule-of-thumb that the greater the skewness and entropy, the lower the effective sparsity.

Temporal network percolation transition
One of the most striking consequences of interevent time skewness is in the collapse of the giant connected component formed by active edges. In Supplementary  Fig. 12 we plot the location of the temporal percolation transition, that is, the point at which active edges can be expected to form giant connected component. There is a sensitive dependence on the choice of interevent time distribution ψ here, as can be seen by comparing the lognormal, Weibull and gamma distributions [Supplemen- For all interevent time distributions, an exponential memory kernel delays the collapse of the giant connected component to higher σ τ values. This delay appears to be more substantial under the lognormal and Weibull distributions, (b) and (c) respectively, than the gamma distribution, (d). Although substantial, the dependence on memory kernel is far exceeded by the dependence on interevent time distribution. We conclude that regardless of the memory kernel, the giant component is fragile under a gamma distribution, robust under a lognormal distribution, and intermediate under a Weibull distribution, relatively speaking.

Supplementary Note 9. Probability distributions
In this Supplementary Note we detail the probability distributions used in this work, in particular those used for the interevent time distribution ψ, and its tail distribution Ψ. For simplicity, the notation used in this Supplementary Note is entirely self contained, and we associate pdfs f (x) with ψ(τ ), and their cdfs F (x) with 1 − Ψ(τ ).
Lognormal distribution. The lognormal distribution is defined with respect to an underlying normal distribution with meanμ and varianceσ 2 . These are related to the lognormal mean and variance µ and σ 2 by the relations given in Supplementary Table II, and can be straightforwardly inverted. Provided µ and σ, it has the largest skewness γ of all distributions studied here, as well as the largest differential entropy h. This is not surprising, given that the lognormal can be derived using maximum entropy principles, as discussed in the previous section, and illustrated in Supplementary Fig. 11. Gamma distribution. The gamma distribution is related to the gamma function, Γ(s), described below. The expressions for its mean and variance Supplementary Table II can be easily inverted, resulting in expressions for α and β the provide the desired moments. Note that we set µ = 1 for the temporal network experiments in this work, meaning that when the shape parameter α = β = 1, coinciding with σ = 1, we recover the exponential distribution. For values α < 1, meaning σ > 1, the qualitative shape of the exponential is maintained, with an increasingly heavy tail. In contrast, when α > 1, meaning σ < 1, the shape changes, and like the lognormal, f goes to zero in the small x limit. For large α, meaning small σ, the gamma distribution tends to the Dirac delta function. In Supplementary Fig. 11 we observe that the gamma distribution is the least skewed of the distributions considered here, for comparable µ and σ. Entropy is given using ψ, the digamma function, defined in the following section under the Weibull distribution.
Variants of the gamma function. We describe here the necessary approximations in order to numerically construct the cdf of the gamma distribution. Unfortunately, the cdf as stated here is circular, with no closed form expression available. In this section, we discuss a number of series expansions that are necessary to numerically evaluate γ(α, x β ), the lower incomplete gamma function. The gamma function Γ(s) is defined, and related to its incomplete variants, as Γ(s) = In other words, the upper and lower incomplete gamma functions are defined by partitioning the integral according to (0, ∞) = (0, x] ∪ [x, ∞), given by γ(s, x) and Γ(s, x), respectively. Sampling from the lower incomplete gamma function will be crucial when initialising our temporal network system at steady state. The choice of series approximation for the lower-incomplete gamma function depends on the shape parameter α, and in turn the standard deviation σ. For small α, meaning large σ, we use γ(s, x) = e −x x s Γ(s) ∞ n=0 x n Γ(s + n + 1) .
This well known approximation, although efficient for large values of standard deviation, becomes prohibitively slow for very large values of α, meaning very small values of σ. At this scale, specifically when α > 1 and σ < 1, we