Intrinsic randomness in epidemic modelling beyond statistical uncertainty

Uncertainty can be classified as either aleatoric (intrinsic randomness) or epistemic (imperfect knowledge of parameters). The majority of frameworks assessing infectious disease risk consider only epistemic uncertainty. We only ever observe a single epidemic, and therefore cannot empirically determine aleatoric uncertainty. Here, we characterise both epistemic and aleatoric uncertainty using a time-varying general branching process. Our framework explicitly decomposes aleatoric variance into mechanistic components, quantifying the contribution to uncertainty produced by each factor in the epidemic process, and how these contributions vary over time. The aleatoric variance of an outbreak is itself a renewal equation where past variance affects future variance. We find that, superspreading is not necessary for substantial uncertainty, and profound variation in outbreak size can occur even without overdispersion in the offspring distribution (i.e. the distribution of the number of secondary infections an infected person produces). Aleatoric forecasting uncertainty grows dynamically and rapidly, and so forecasting using only epistemic uncertainty is a significant underestimate. Therefore, failure to account for aleatoric uncertainty will ensure that policymakers are misled about the substantially higher true extent of potential risk. We demonstrate our method, and the extent to which potential risk is underestimated, using two historical examples.


Introduction
Infectious diseases remain a major cause of human mortality. Understanding their dynamics is essential for forecasting cases, hospitalisations, and deaths, and to estimate the impact of interventions. The sequence of infection events defines a particular epidemic trajectory -the outbreakfrom which we infer aggregate, population-level quantities. The mathematical link between individual events and aggregate population behaviour is key to inference and forecasting. The two most common analytical frameworks for modelling aggregate data are susceptible-infected-recovered (SIR) models [27] or renewal equation models [22,40]. Under certain specific assumptions, these frameworks are deterministic and equivalent to each other [11]. Several general stochastic analytical frameworks exist [2,40], and to ensure analytical tractability make strong simplifying assumptions (e.g. Markov or Gaussian) regarding the probabilities of individual events that lead to emergent aggregate behaviour.
We can classify uncertainty as either aleatoric (due to randomness) or epistemic (imprecise knowledge of parameters) [29]. The study of uncertainty in infectious disease modelling has a rich history in a range of disciplines, with many different facets [9,38,44]. These frameworks commonly propose two general mechanisms to drive the infectious process. The first is the infectiousness, which is a probability distribution for how likely an infected individual is to infect someone else. The second is the infectious period, i.e. how long a person remains infectious. The infectious period can also be used to represent isolation, where a person might still be infectious but no longer infects others and therefore is considered to have shortened their infectious period. Consider fitting a renewal equation to observed incidence data [40], where infectiousness is known but the rate of infection events ρ(·) must be fitted. The secondary infections produced by an infected individual will occur randomly over their infectious period g, depending on their infectiousness ν. The population mean rate of infection events is given by ρ(t), and we assume that this mean does not differ between individuals (although each individual has a different random draw of their number of secondary infections). In Bayesian settings, inference yields multiple posterior estimates for ρ, and therefore multiple incidence values. This is epistemic uncertainty: any given value of ρ corresponds to a single realisation of incidence. However, each posterior estimate of ρ is in fact only the mean of an underlying offspring distribution (i.e. the distribution of the number of secondary infections an infected person produces). If an epidemic governed by identical parameters were to happen again, but with different random draws of infection events, each realisation would be different, thus giving aleatoric uncertainty.
When performing inference, infectious disease models tend to consider epistemic uncertainty only due to the difficulties in performing inference with aleatoric uncertainty (e.g. individual-based models) or analytical tractability. There are many exceptions such as the susceptible-infectedrecovered model, which has stochastic variants that are capable of determining aleatoric uncertainty [2] and have been used in extensive applications (e.g. [42]). However, we will show that this model can underestimate uncertainty under certain conditions. An empirical alternative is to characterise aleatoric uncertainty by the final epidemic size from multiple historical outbreaks [12,49] but these are confounded by temporal, cultural, epidemiological, and biological context, and therefore parameters vary between each outbreak. Here, following previous approaches [2], we analyse aleatoric uncertainty by studying an epidemiologically-motivated stochastic process, serving as a proxy for repeated realisations of an epidemic. Within our framework, we find that using epistemic uncertainty alone is a vast underestimate, and accounting for aleatoric uncertainty shows potential risk to be much higher. We demonstrate our method using two historical examples: firstly the 2003 severe acute respiratory syndrome (SARS) outbreak in Hong Kong, and secondly the early 2020 UK COVID-19 epidemic.

Results
An analytical framework for aleatoric uncertainty A time-varying general branching processes proceeds as follows: first, an individual is infected, and their infectious period is distributed with probability density function g (with corresponding cumulative distribution function G). Second, while infectious, individuals randomly infect others (via a counting process with independent increments), driven by their infectiousness ν and a rate of infection events ρ. That is, an individual infected at time l, will, at some later time while still infectious t, generate secondary infections at a rate ρ(t)ν(t − l). ρ(t) is a population-level parameter closely related to the time-varying reproduction number R(t) (see Methods and [40] for further details), while ν(t − l) captures the individual's current infectiousness (note that t − l is the time since infection). We allow multiple infection events to occur simultaneously, and assume individuals behave independently once infected, thus allowing mathematical tractability [24]. Briefly, we model an individual's secondary infections using a stochastic counting process, which gives rise to secondary infections (i.e. offspring) that are either Poisson or Negative Binomial distributed in their number, and Poisson distributed in their timing (see Supplementary Notes 3.3 and 3.4). We study the aggregate of these events (prevalence or incidence) through closed-form probability generating functions and probability mass functions. Our approach models epidemic evolution through intuitive individual-level characteristics while retaining analytical tractability.
Importantly, the mean of our process follows a renewal equation [1,40,41]. Our formulation unifies mechanistic and individual-based modelling within a single analytical framework based on branching processes. Figure 1 shows a schematic of this process. Formal derivation is in Supplementary Note 3.

Infection duration Infectiousness
Rate Randomness occurs at individual level, and there is a distribution of possible realisations of the epi-demic given identical parameters. Simulating our general branching process would be cumbersome using the standard approach of Poisson thinning [39], and inference from simulation is more challenging still. Using probability generating functions, we analytically derive important quantities from the distribution of the number of infections, including the (central) moments and marginal probabilities given ρ, g and ν (with or without epistemic uncertainty). We additionally use the probability generating function to prove general, closed-form, analytical results such as the decomposition of variance into mechanistic components, and the conditions under which overdispersion exists (i.e. where variance is greater than the mean). Finally, we derive a general probability mass function (likelihood function) for incidence.
If infection event k = 0, . . . , n occurred at time τ k and produced y k infections, let x kj denote the end time of the infectious period of the j th infection at event k. Note that τ 0 = l is the time of the first infection event and y 0 = 1. Then the likelihood L InfPeriod of each infected person's infectious period is a product over all infections given by The likelihood of there being y k infections at time τ k is given by where p y k (τ k , τ i ) is the (infinitesimal) rate at which an individual infected at τ i causes y k infections at time τ k , provided it is still infectious. Finally, the probability that no other infections occurred between the infection events at times (τ k ) n k=0 is given by where r is the infection event rate and t is the current time. Note the term exp(−x) comes from a Poisson assumption. Our full likelihood L Full is then Full derivations of these quantities are provided in Supplementary Note 3. If discrete time is assumed, equation 4 simplifies to a likelihood commonly used for inference [13]. Markov Chain simpler to solve the probability generating function with complex integration. The probability generating function, equations for the variance, and derivations of the probability mass function are found in Supplementary Notes 3,4,5 and 6, and a summary of the main analytical results is found in the Methods.

The dynamics of Uncertainty
We derive the mean and variance of our branching process. common Gaussian stochastic processes, the general variance in disease prevalence is described through a renewal equation. Therefore, future uncertainty depends on past uncertainty, and so the uncertainty around subsequent epidemic waves has memory. Additionally, uncertainty is driven by a complex interplay of time-varying factors, and not simply proportional to the mean. For example, a large first wave of infection can increase the variance of the second wave. As such, the general variance equation 9 disentangles and quantifies the causes of uncertainty, which remain obscured in brute-force simulation experiments [2].
Consider a toy simulated epidemic with ρ(t) = 1.4 + sin(0.15t), where the offspring distribution is Poisson in both timing and number of secondary infections, and where infectiousness ν is given by the probability density function ν ∼ Gamma (3,1), and, similarly, the infectious period g ∼ Gamma(5,1). Here the parameters of the Gamma distribution are the shape and scale respectively. The resulting variance is counterintuitive. We prove analytically that overdispersion emerges despite a non-overdispersed Poisson offspring distribution. The second wave has a lower mean but a higher variance than the first wave ( Figure 2), because uncertainty is propagated. If the variance were Poisson, i.e. equal to the mean, the second wave would instead have a smaller variance due to fewer infections. Initially, uncertainty from individuals is largest, but as the epidemic progresses, compounding uncertainty propagated from the past dominates [ Figure 2, bottom right]. Note that in this example with zero epistemic uncertainty (we know the parameters perfectly), aleatoric uncertainty is large.
a b c d Compounding uncertainty from past events is the dominant contributor to overall uncertainty.
In Equation 9, the first two terms account for uncertainty in the infectious periods of all infected individuals. The third term denotes the uncertainty from the offspring distribution. By construction, the timing of infections is an inhomogenous Poisson process, where at each infection time the number of infections is random. The third term (Equation 9b) contains the second moment of the offspring distribution, which is the variability around its mean (i.e. ρ(t)). The second moment quantifies the extent of possible superspreading. In contrast to other studies [33,50], we find that individual-level overdispersion in the offspring distribution is less important than explosive epidemics. Under a null Poisson model, with no overdispersion (see Poisson case in Figure 2), substantial aleatoric uncertainty arises from a Poisson offspring distribution combined with variance propagation. We rigorously prove via the Cauchy-Schwarz inequality that, under a mild condition on the possible spread of the epidemic, the variance of number of infections at a given time is always greater than the mean, and hence is overdispersed. Overdispersion in the offspring infection distribution is therefore not necessary for high aleatoric uncertainty, although it still increases variance at both individual-level and population-level.
We derive the conditional variance, with known past events but unknown future events. Conditional variance grows proportionally to the square of the mean, with additional terms containing the previous variance. Therefore aleatoric uncertainty grows and forecasting exercises based only on epistemic uncertainty greatly underestimates the risk of very large epidemics, and this underestimation becomes more severe as the forecast horizon expands or as the epidemic grows.

Aleatoric uncertainty over the SARS 2003 epidemic
To demonstrate the importance of aleatoric uncertainty, we analyse daily incidence of symptom onset in Hong Kong during the 2003 severe acute respiratory syndrome (SARS) outbreak [14,26,31]. The epidemic struck Hong Kong in March-May 2003, with a case fatality ratio of 15%. We fit a Bayesian renewal equation assuming a random walk prior distribution for the rate of infection events ρ [40], using Equation 4 for inference. We ignore g and assume that the distribution of generation times mirrors the distribution of infectiousness, i.e. that the infectiousness ν equals the generation time [31]. Note these parameter choices are illustrative and do not affect our main conclusions. The fitted ρ(t) in Figure 3 (top left) shows two major peaks, consistent with the major transmission events in the epidemic [26]. Figure 3 (top right) shows the mean epistemic fit, with epistemic (posterior) uncertainty tightly distributed around the data. Figure 3 (bottom left) shows the aleatoric uncertainty under optimistic and pessimistic scenarios (i.e. the upper and lower bounds of ρ(t) in Figure 3 (top right)). The pessimistic scenario includes the possibility of extinction, but also an epidemic that could have been more than six times larger than that observed. The optimistic scenario suggests we would observe an epidemic of at worst comparable size to that observed. Finally, Figure 3 (bottom right) shows epistemic and aleatoric forecasts at day 60 of the epidemic, fixing ρ(t) using the 95% epistemic uncertainty interval to be constant at either ρ(t ≥ 60) = 0.38 or ρ(t ≥ 60) = 0.83 and simulating forwards. While the epistemic forecast does contain the true unobserved outcome of the epidemic, it underestimates true forecast uncertainty, which is 1.3 times larger. The range of the constant ρ for forecast is below 1, and yet we still see substantial aleatoric uncertainty. If ρ were above 1 for a sustained period, aleatoric uncertainty would play a smaller role [5], but this is rare with real epidemics, where susceptible depletion, behavioural changes or interventions keep ρ around 1. Our results therefore highlight that epistemic uncertainty drastically underestimates potential epidemic risk.  [14,31]. (a) ρ(t) with 95% epistemic uncertainty.
(b) Fitted incidence mean, 95% epistemic uncertainty with observational noise from using Equation 4. Data is daily incidence of symptom onset. (c) Aleatoric uncertainty from the start of the epidemic under an optimistic and pessimistic ρ(t). (d) Epistemic (blue) and epistemic and aleatoric uncertainty (red) while keeping ρ constant at the forecast data (dotted line). Forecasting is from day 60.
Aleatoric risk assessment in the early 2020 COVID-19 pandemic in the UK To demonstrate the practical application of our model, we retrospectively examine the early stage of the COVID-19 pandemic in the UK, using only information available at the time. While the date of the first locally transmitted case in the UK remains unknown (likely mid-January 2020 [43]), COVID-19 community transmission was confirmed in the UK by late January 2020, and we therefore start our simulated epidemic on January 31st 2020. We consider uncertainty in the predicted number of deaths on March 16th 2020 [19], during which time decisions regarding non-pharmaceutical interventions were made. Testing was extremely limited during this period, and COVID-19 death data were unreliable. For this illustration, we assume that we did not know the true number of COVID-19 deaths, as was the case for many countries in early 2020.
Policymakers then needed estimates of the potential death toll, given limited knowledge of COVID-19 epidemiology and unreliable national surveillance.
We simulated an epidemic from a time-varying general branching process with a Negative Binomial offspring distribution, using parameters that were largely known by March 16th 2020 ( Table 1).
The infection fatality ratio, infection-to-onset distribution and onset-to-death distribution were convoluted with incidence [40] to estimate numbers of deaths. Estimated COVID-19 deaths and uncertainty estimates between January 31st and March 16th 2020 are shown in Figure 4 (Top).
While the epistemic uncertainty contains the true number of deaths, it is still an underestimate, and including aleatoric uncertainty, we find that the epidemic could have had more than four times as many deaths. Consider a hypothetical intervention on March 17th 2020 (Figure 4 (bottom)) that completely stops transmission. Deaths would still occur from those already infected but no new infections would arise. In this hypothetical case, the aleatoric uncertainty would still be 2.5 times the actual deaths that occurred (when in fact transmission was never zero or close to it).
This hypothetical scenario highlights the scale of aleatoric uncertainty, and demonstrates that our method can be useful in assessing risk in the absence of data by giving a reasonable worst case. Further, we observe that using only epistemic uncertainty provides a reasonably good fit in a relatively short time-horizon ( Figure 4, Top), but soon afterwards greatly underestimates uncertainty ( Figure 4, Bottom). The fits using aleatoric uncertainty provide a more reasonable assessment of uncertainty. While we concentrate on the upper bound, the lower bound on the worst-case scenario still exceeds zero, and therefore the epidemic going extinct by March 16th in the worst-case with no external seeding would have been very unlikely. Aleatoric uncertainty highlights a more informative reasonable worst-case estimate than epistemic uncertainty alone, and could be a useful metric for a policymaker in real time, with low-quality data, without requiring simulations from costly, individual-based models.

Discussion
Stochastic models more realistically model natural phenomena than deterministic equations [37], and particularly so with infection processes [3]. Accordingly, individual-based models have found much success [20,48] in capturing the complex dynamics that emerge from infectious disease outbreaks, and have been highly influential in policy [19]. However, despite a plethora of alternatives, many analytical frameworks still tend to be deterministic [14,17,21], and only consider statistical, epistemic parameter uncertainty. Frameworks that expand deterministic, mechanistic equations to include stochasticity use a Gaussian noise process [2], or restrict the process to be Markovian.
Markovian branching processes require the infection period or generation time to be exponentially distributed -a fundamentally unrealistic choice for most infectious diseases. Further, a Gaussian noise process is unlikely to be realistic [12].
Our results show that individual-level uncertainty is overshadowed by uncertainty in the infection process itself. Profound overdispersion in infectious disease epidemics is not simply a result of overdispersion in the offspring distribution, but is fundamental and inherent to the branching process. We rigorously prove that even with a Poisson offspring distribution (not characterized by overdispersion), overdispersion in resulting prevalence or incidence is still virtually always guaranteed. We show that forecast uncertainty increases rapidly, and therefore common forecasting methods almost certainly underestimate true uncertainty. Similar to other existing frameworks, our approach provides a different methodological tool to evaluate uncertainty in the presence of little to no data, assess uncertainty in forecasting, and retrospectively assess an epidemic. Other approaches, such as agent based models, could also be readily used. However, the framework we present permits the unpicking of dynamics analytically and from first principles without a black box simulator. Equally, this is also a limitation, since new and flexible mechanisms cannot be easily integrated or considered.
We have considered only a small number of mechanisms that generate uncertainty. Cultural, behavioural and socioeconomic factors could introduce even greater randomness. Therefore our framework may underestimate true uncertainty in infectious disease epidemics. The converse is also likely, contact network patterns and spatial heterogeneity also limit the routes of transmission, such that the variability in anything but a fully connected network will be lower. Furthermore, our assumption of homogeneous mixing and spatial independence overestimates uncertainty. A sensible next step for future research to to study the dynamics of these branching processes over complex networks. Finally at the core of all branching frameworks in an assumption of independence, which is unlikely to be completely valid (people mimic other people in their behaviour) but is necessary for analytical tractability. Studying the effect of this assumption compared to agent based models would also be a useful area of future research.
We provide one approach to determining aleatoric uncertainty. Other approaches based on stochastic differential equations, Markov processes, reaction kinetics, or Hawkes processes all have their respective advantages and disadvantages. The differences in model specific aleatoric uncertainty and how close the models come to capturing the true, unknown, aleatoric uncertainty is a fundamental question moving forwards. In this paper we have provided yet another approach to characterise aleatoric uncertainty, where this approach is most useful and how it can be reconciled with existing approaches will be an interesting area of study.

Methods
Detailed derivations of the methods can be found in the Supplementary Notes, with a high level description of the content found in Supplementary Note 1.
at some time l, and their infectious period L is distributed with probability density function g (and cumulative distribution function G). Second, during their infectious period, they randomly infect other individuals, affected by their infectiousness ν(t − l), and their mean number of secondary infections, which is assumed to be equal to the population-level rate of infection events ρ(t). ρ(t) is closely related to the time-varying reproduction number R(t) (see [40] for details). The infectious period g accounts for variation in individual behaviour. If people take preventative action to reduce onward infections, their reduced infection period can stop transmission despite remaining infectious.
Where infectious individuals do not change their behaviour, g can be ignored and individual-level transmission is controlled by infectiousness ν only. Each newly infected individual then proceeds independently by the same mechanism as above. Specifics can be found in Supplementary Notes 2.1-2.5.
Formally, if an individual is infected at time s, their number of secondary infections is given by a stochastic counting process {N (t, s)} t≥s , which is independent of other individuals and has independent increments. We assumehere that the epidemic occurs in continuous time, and hence that N (t, s) is continuous in probability, although we consider discrete-time epidemics in Supplementary Note 7. To aid calculation, we suppose N (t, s) can be defined from a Lévy Process N (t) -that is, a process with both independent and identically distributed increments -via N (t, s) = N t s r(k, s)dk for some non-negative rate function r. It is assumed that each counting process {N (t, s)} t≥s is defined from an independent copy of M (t). This formulation has two advantages: first, the dependence of N (t, s) on s is restricted to the rate function r; and second, if J N (t) counts the number of infection events in N (t) (where here infection events refer to an increase, of any size, in N (t, s)), then J N (t) is a Poisson process with some rate κ [4]. We can then define J(t, s) to be the counting process of infection events in N (t, s), and Y (v) to be size of the infection event (i.e. the number of secondary infections that occur) t time v. We assume that Y is independent of s, although such a dependence would curtail superspreading to depend on infectiousness, and could be incorporated into the framework. Therefore J(t, s) is an inhomogeneous Poisson Process (and so N (t, s) has been characterised as an inhomogeneous compound Poisson Process). We consider the cases where N (t, s) is itself an inhomogeneous Poisson process, and where N (t, s) is a Negative Binomial process. This allows us to examine effects of overdisper-sion in the number of secondary infections, although our framework allows for more complicated distributions.
Here, r(t, l) = ρ(t)ν(t − l) where ρ(t) models the population-level rate of infection events, and ν(t − l) models the infectiousness of an individual infected at time l. If ν(t − l) is sufficiently well characterised by the generation time (i.e. where the timing of secondary infections mirrors tracks their infectiousness) , and the infectious period can be ignored, then the integral t l r(s, l)ds has the same scale as the commonly used reproduction number R(t) [40]. The branching process yields a series of birth and death times for each individual (i.e. the time of infection and the end of the infectious period respectively), from which prevalence (the number of infections at any given time) or cumulative incidence (the total number of infections up to any time) can be defined.

Probability generating function
We derive the probability generating function for a time-varying age-dependent branching process, allowing derivation of the mean and higher-order moments (full derivations can be found in Note that if the individuals directly infected by the initial individual are infected at times l + t 1 , ..., l + t n , then This observation allows us to write the generating function F (t, l) as a function of F (t, u) for u ∈ (t, l). As F (t, t) = s, this allows us to iteratively find the value of F (t, l). Explicitly, we have where q 1 (z; s) = se z , and where q 2 (z) = e z in the case where Z(t, l) refers to prevalence, whereas in the log-series case and that the constant κ is absorbed into ρ.
The key intuition in understanding Equation 7 is that for an integer random variable X and iid (independent and identically distributed) random variables where G X and G Y are the generating functions of X and Y i respectively. Thus, we expect the pgfs of the various parts of our model to combine via composition, as occurs in the equation above.
Mean incidence can recovered from both prevalence (via back calculation [40]) and cumulative incidence. In Equation 7 for the Negative Binomial case, ϕ is the degree of overdispersion. Equation 7 is solvable using via quadrature and the fast Fourier transform via a result from complex analysis [34] and scales easily to populations with millions of infected individuals, and the probability mass function can be computed to machine precision (a full derivation is available in Supplementary Note 3.7).

Variance decomposition
For simplicity, we only summarise the decomposition for prevalence, but an analogous and highly similar derivation for cumulative incidence can be found in Supplementary Note 3.5. We can derive an analytical equation for the mean and variance of the entire branching process (full derivations can be found in Supplementary Notes 4.1-4.7 and the mathematical properties of the variance equations can be found in Supplementary Notes 6.1-6.3) . The mean prevalence M (t, l) is given by Note, ρ can be scaled to absorb the E(Y ) and κ constants. Equation 8 is consistent with that previously derived in [40]. The second moment, The variance can be decomposed into three mechanistic components. is distinct from the uncertainty in individual infection events. In short, and unlike Gaussian stochastic processes, the general variance in disease prevalence is described through a renewal equation. Intuitively then, uncertainty in an epidemic's future trajectory is contingent on past infections, and that the uncertainty around consecutive epidemic waves are connected. As such, the general variance equation 9 allows us to disentangle important aspects of infection dynamics that remain obscured in brute-force simulations [2].

Overdispersion
We define an epidemic to be expanded if at time t there is a non-zero probability that the prevalence, not counting the initial individual or its secondary infections, is non-zero.
Note that this is a very mild condition on an epidemic -in a realistic setting, the only way for an epidemic to not be expanded is if it is definitely extinct by time t, or if t is small enough that tertiary infections have not yet occurred.
Large aleatoric variance intrinsic to our branching process implies that the prevalence of new infections (that is, prevalence excluding the deterministic initial case) is always strictly overdispersed at time t, providing the epidemic is expanded at time t. A full proof is given in Supplementary Note 4.4, but we provide here a simpler justification in the special case that G(t − l, l) = 1.
In this case, prevalence of new infections is equal to standard prevalence, and the equations for M (t, l) and V (t, l) simplify significantly. Switching the order of integration in the equation for and hence, the Cauchy-Schwarz Inequality shows that as t−l 0 g(u, l)du = 1. Thus, the first term, (9a), in the variance equation is non-negative.
The remaining terms can be dealt with as follows. (9a) is equal to zero, and the sum of (9c) is (using Finally, noting that Z(t, l + u) 2 ≥ Z(t, l + u), this is bounded below by and hence, for each u (as If new infections can be caused, then more than one new infection can be caused. Thus, if an Hence, Z(t, l) is strictly overdispersed for expanded epidemics. This means that Gaussian approximations are unlikely to be useful.

Variance midway through an epidemic
It is important to calculate uncertainty starting midway through an epidemic, conditional on previous events. This derivation is significantly more algebraically involved than the other work in this paper. For simplicity, we assume that N (t, l) is an inhomogeneous Poisson Process, and that L = ∞ for each individual.
Suppose that prevalence (here equivalent to cumulative incidence) Z(t, l) = n + 1. We create a strictly increasing sequence l = B 0 < B 1 < · · · < B n of n + 1 infection times, which has probability density function where pdf is short for probability mass function. Then, the variance at time t + s is given by where M * (t + s, b) and V * (t + s, b) are the mean and variance of the size of the infection tree (i.e. prevalence or cumulative incidence) at time t + s, caused by an individual infected at time b, ignoring all individuals they infected before time t. These quantities are calculated from M and V .
Note also that f Bi and f Bi,Bj are the one-and-two-dimensional marginal distributions from f B .

Bayesian inference and for SARS epidemic in Hong Kong
The data for the SARS epidemic in Hong Kong consist of 114 daily measurements of incidence (positive integers), and an estimate of the generation time [46] obtained via the R package EpiEstim [13]. We ignore the infectious period g and set the infectiousness ν to the generation interval. The inferential task is then to estimate a time varying function ρ from these data using Equation 4. As we note in Equation 4 and in Supplementary Note 5 and 7.1-7,4, discretisation simplifies this task considerably. Our prior distributions are as follows where ρ is modelled as a discrete random walk process. The renewal likelihood in Equation 4 is vectorised using the approach described in [40]. Fitting was performed in the probabilistic programming language Numpyro, using Hamiltonian Monte Carlo [25] with 1000 warmup steps and 6000 sampling steps across two chains. The target acceptance probability was set at 0.99 with a tree depth of 15. Convergence was evaluated using the RHat statistic [23].
Forecasts were implemented through sampling using MCMC from Equation 4. In order to use Hamiltonian Markov Chain Monte Carlo, we relax the discrete constraint on incidence and allow it to be continuous with a diffuse prior. We ran a basic sensitivity analysis using a Random Walk Metropolis with a discrete prior to ensure this relaxation was suitable. In a forecast setting, incidence up to a time point (T = 60) is known exactly and given as y t≤T . and we have access to an estimate for ρ(t > T ) in the future. In our case we fix ρ(t > T ) = ρ(T ).
Numerically calculating the probability mass function via the probability generating function Following [35] and [7] (originally from [34]), the probability mass function p can be recovered This is generally computationally intractable. A well-known result from complex analysis [34] holds that f (n) (a) =  [7]. The probability mass function for any time and n can be determined numerically. One needs M ≥ n, which requires solving n renewal equations for the generating function and performing a fast Fourier transform. This is computationally fast, but may become slightly burdensome for epidemics with very large numbers of infected individuals (millions). A derivation of this approximation is provided in the Supplementary Note 3.7.

Competing interests
All authors declare no competing interests Data availability Data from Figure 3 is available via the R-Package EpiEstim [14], and data from Figure 4  • The first note, "Modelling", provides a precise definition of the branching process model used throughout the paper.
• The second note, "Probability generating functions" derives probability generating functions (pgfs) for prevalence and cumulative incidence. It also discusses their efficient solution, including some special cases in which one can speed up the solution process • The third note, "Properties of the prevalence variance", derives the equation for the variance (via the previously derived equations for the pgf) and explores its properties, providing explanations for the various terms and proving that the prevalence of new infections is (under a mild condition on the possible spread of the epidemic) overdispersed.
• The fourth note, "Likelihood functions" contains the derivations of the pgf of the infection event times and the likelihood function presented in the main text.
• The fifth note, "Assessing future variance during an epidemic" derives the equation for variance of future cases when the cumulative incidence is known at some point in time.
• Finally, the sixth note, "Discrete epidemics" provides a range of similar results in the discrete setting, and shows the convergence of the pgf to its continuous equivalent as the step-size tends to zero.

A Background literature on renewal equations
A common approach to modelling infectious diseases is to use the renewal equation. The early theory on the properties of the renewal equation can be found here [18]. Epidemiologically derived descriptions can be found here [13,22] where the renewal equation is framed in an epidemiological framework with reference to infection processes. The link between the renewal equation and the popular susceptible-infected-recovered models can be found here [10]. The basics of branching processes can be found here [24]. In what follows, we will arrive at a renewal equation from first principles by first starting with the probability generating function of a general branching process.

B.1 Branching process framework
We present a general time-varying age-dependent branching process that is most similar to the general branching process initially proposed by Crump, Mode and Jagers [15,16]. Following [40], in our process, we begin with a single individual infected at some time l whose infectious period is a random variable distributed by cumulative distribution function G(·, l), admitting a probability density g(·, l). During this individual's life length, the individual gives rise to an integer-valued random number of secondary infections according to a counting processes {N (t, l)} t≥l ({N (t, l)} is the number of secondary infections) where t is a global "calendar" time. The amount of time for which the individual has been infected before time t is therefore t − l.

For each infection event time -that is, for each
we then define a random variable to be the size of the infection event at time v; that is, this is the number of individuals that are infected (by the initial individual) at time v. Throughout this paper, it will be assumed that Each newly infected individual then proceeds, independently, in the same way as the initial individual. The only change is that the time at which they are infected will be different (but, for example, the infection tree rooted at an individual infected at time s > l is equal in distribution to the full infection tree if one started an epidemic with l = s). This self-similarity property underpins the derivations in the subsequent notes, as it allows an epidemic to be characterised purely by the "first generation" of infected individuals (and hence, the equations are derived using the "first generation principle").

B.2 The counting process, N (t, l)
Our framework relies on the assumption that the counting processes N (t, l) has independent increments and is continuous in probability: This condition excludes any discrete formulations of the epidemic process. It will be shown later in the supplement that discrete epidemics (which are not continuous in probability), are structurally different as extra terms appear in the equations for the pgf. However, the equations in the continuous case are recovered as the step-size of the discrete process tends to zero.
A further assumption on N (t, l) is that it can be constructed from a Lévy Process -that is, there is some non-negative rate function r(t, l) and some Lévy Process N (t) such that Note that the counting processes relating to different individuals are independent, and hence will come from different independent copies of the base process N .
This assumption is important because it means that the counting process of "infection events" (that is, points in time such that the value of N (t, l) changes) is an inhomogeneous Poisson Process, which can be shown as follows. Consider a counting process, J N (t, l) that counts the increases in N . That is, where here | · | denotes the number of elements in a set. Then, as N is a Lévy Process, J N (t) has iid (independent and identically distributed) increments and is non-decreasing in t with jumps of size 1 and thus follows a Poisson Process with some rate κ [4]. J(t, l) has a generating function of The rate function, r(t, l) Throughout the examples in this paper, the rate function r(t, l) will be given as Here, ρ(t) is a population-level infection event rate. Note that, because the number of infections caused at each infection rate may be greater than 1 (that is one may have J(t, l) < N (t, l)), ρ(t) cannot necessarily be interpreted in direct analogue to the reproduction number. ν(t − l) gives the infectiousness of an individual after it has been infected for time (t − l). It will be assumed that ∞ 0 ν(s)ds = 1 so that it ρ can be interpreted as the infection event rate.

B.4 Smoothness assumptions
Note that, throughout the derivations of this paper, the smoothness of ρ, ν and g will not be explicitly considered when taking limits -it will be assumed that they are sufficiently smooth for "natural" results to hold. The authors believe that the results of this paper will hold for any piecewise continuous choices for these functions, although more detailed analysis would be needed to provide a rigorous proof of this. It is possible that they hold for much wider classes of functions, but this seems to the authors to be outside the realm of epidemiological interest, as it appears implausible that any of these functions would not be piecewise continuous in a realistic setting.
Moreover, it will be assumed that unique solutions to the equations for the pgf, mean and variance exist. Again, a proof of this property is beyond the scope of this work, although the classes of equations presented in this paper are common across the literature, and it is likely that interested readers with a pure mathematical background could find applicable results to address this issue.

B.5 Special cases for N (t, l)
Throughout this paper, two special cases for N (t, l) are considered -the case where N (t, l) is itself an inhomogeneous Poisson Process, and the case where N (t, l) is a Negative Binomial process.
These were used to construct the figures in the paper and explanations as to how they can be used will be presented throughout this supplement.

C Probability generating functions C.1 General case
Define F (t, l; s) := E s Z(t,l) to be the generating function of Z(t, l). For simplicity of notation the dependence of F on s will be suppressed.
To derive the generating function F (t, l), we condition on the infection period (lifetime) of the initial case, L. Define the set of times at which these infected events occurred to be {K 1 , ..., K J(l+u,l) } where here, importantly, the K i are labelled in a random order (so it is not necessarily the case that K 1 < ... < K J(l+u,l) ). As J is an homogeneous Poisson Process and N (t, l) is continuous in probability, the K i are therefore iid with pdf (probability density function) It is perhaps helpful to note that this is the step which relies on N being continuous in probability.
If this were not the case and N (t, l) had non-zero probability of increasing at some time s, then the knowledge that K 1 = s would give some information about K 2 , as the fact that K 2 ̸ = s would change its probability distribution, meaning K 1 and K 2 would not be independent. Conversely, in the continuous case, K 1 = s removes an event of zero measure from the probability space of K 2 , and hence K 1 and K 2 are still independent.
Now, by the self-similarity property ( [24,28]) we have where each Z ij is an independent copy of Z that is equal in distribution. Recall that if X i are iid random variables (with a generating function, G X (s)) and if Y is a non-negative integer-valued random variable (again with a generating function, G Y (s)), then, By defining J (t,l) to be the generating function of J(t, l), this relationship allows us to write Zj (t,l+K(l+u,l)) (S.32) where here, K is equal in distribution to the K i . Conditioning on the value of K, Zj (t,l+k) r(l + k, l) λ(l + u, l) dk (S.33) Thus, defining Y (l+k) to be the generating function of We can equivalently write this as an exponential, using the fact that J(t, l) is Poisson distributed: An identical derivation can be performed on the first integral in Supplementary Equation S.26 (swapping t − l for u and multiplying by s to account for the initial case, which is counted in the prevalence at t when L > t − l), resulting in and therefore, this yields an overall pgf 39) or, equivalently Note that by absorbing κ into the rate function r(l +k, l), it can be assumed that κ = 1. Intuitively this is simply scaling the probability density by the number of points.

C.2 Solving the pgf equation
Practically, one will always set l = 0 for an epidemic, and so only the values F (t, 0) are directly relevant. However, it is still necessary to solve for F (t, l) for 0 ≤ l ≤ t. In the language of PDEs (partial differential equations) and, specifically, the Cauchy problem, this can be explained by the fact that the "data curve" is the line t = l (as the values of F (t, t) are known to be equal to s) and the "characteristics" of the system are the lines t = constant. Thus, to calculate the value of F (t, 0), it is necessary to follow the characteristic from (t, t) to (t, 0) and hence calculate F (t, l) for 0 ≤ l ≤ t.

C.3 Poisson case
If N (t, l) is an inhomogeneous Poisson Process, then, as the infection event size for a Poisson Process is always 1 [4], one has Y (t) (s) = s. To aid understanding below in the Negative Binomial case, it is helpful to note that the Lévy Process, N , can hence be characterised by Setting κ = 1 as discussed above, the generating function equation becomes This equation can be further simplified by recalling that For computational ease the auxiliary function equation is then

C.4 Inhomogeneous Negative Binomial case
Our derivation follows from the well-known relationship that the Negative Binomial distribution arises from a compound Poisson distribution. For p ∈ (0, 1) and ϕ ∈ R + , if and each Y i is independent of N , iid, and follows a logarithmic series distribution then the random variable X is Negative Binomial distributed. This can easily be proven using For clarity we re-derive this relationship explicitly. We have As M (t) has iid increments, Thus, to leading order, for k > 0, one has as expected. Moreover, the pmf (probability mass function) of a infection event size, Y is given by One can hence find the generating function as These results can be substituted into the general formula to give As in the Poisson case, this equation can be simplified by factoring λ The easier-to-solve auxiliary function is given by

C.5 Cumulative incidence
Similar to prevalence, cumulative incidence can be calculated by counting all previous infections as well as current ones. Following an identical derivation to prevalence the pgf for cumulative incidence simply requires multiplying the second integral by s as the initial infection is counted in the cumulative incidence regardless of the value of L.
C.6 A simplified pgf ignoring g By assuming g(u, l) = 0 ∀ u and therefore G(u, l) = 0 ∀ u, the pgf for prevalence (or, in this case, equivalently, cumulative incidence) simplifies to Additional computational savings can be gained in our case r(t, l) = ρ(t)ν(t−l) if the infectiousness ν decays to zero quickly. This means that the auxiliary equation used for computation can be truncated to some time min(t, T ). For example, in the Poisson case this becomes, and in the Negative Binomial case this becomes,

C.7 Calculating the probability mass function via the pgf
Following [35] and [7] (originally from [34]), by the properties of pgfs, the probability mass function p can be recovered through a pgf F 's derivatives at s = 0 This is generally computationally intractable. A well-known result from complex analysis [34] holds that This integral can be done on a closed circle around the origin such that z = re iθ and dz = izdθi.e. The probability mass function for any time and n can be determined numerically. One needs M ≥ n, which requires solving n renewal equations for the generating function and performing a fast Fourier transform. This is generally computationally fast, but may become slightly burdensome for epidemics with very large numbers of infected individuals.

D Properties of the prevalence variance D.1 Derivation of equation for mean prevalence
Before deriving the equation for the prevalence variance, it is important to derive the equation governing the mean prevalence. This has been previously derived in [40], although here, we re-derive it from our new pgfs. First note that Now, setting s = 1 so that F (·, ·) = 1 and F s (·, ·) = M (·, ·), one has

D.2 Derivation of equation for prevalence variance
The equation for variance can now be found by taking the second derivative of the pgf. Define W (t, l) := E(Z(t, l)(Z(t, l) − 1)). Note that this then gives the variance, V (t, l) as V (t, l) = Consider first the term The first derivative of this term is equal tō Then, the second derivative is equal to ). Note also E(J(t, l)) = λ(t, l). Thus, the second derivative evaluated at s = 1 is and hence, subtracting M (t, l) 2 from both sides of the equation for X(t, l) gives Each term will be derived by assuming that all other parts of the model are deterministic. To begin, suppose that the infectious period of the initial individual is random but all other parts of the model are deterministic, so that, given that the initial individual is infectious at time l + u, it will infect B(l + u)r(l + u, l)dt people in the interval [u, u + dt] (note that this is an abstraction to illustrate the source of this variance, as it is impossible for non-integer numbers of infections to occur). Moreover, it is assumed that each of these individuals have given rise to exactly M (t, l + u) infections at time t. Then, note that var(Z(t, l)) = E(Z(t, l) 2 ) − E(Z(t, l)) 2 (S.107) a Poisson variable, A k , with mean r(l + k, l)Ḡ(k, l)dt and hence the number of infections is that Poisson variable multiplied by Y (l + k, l). Finally, note that, as before, any individuals born at time l + k will be assumed to deterministically cause M (t, l + k) active infections at time t. Thus, Ignoring the dt 2 term as it has zero measure, and noting that Y and A k are independent var(Z(t, l)) = which can easily be seen to give the correct term.

D.4 Overdispersion
For the purposes of this note, it is helpful to create the following definition Expanded: An epidemic is called "expanded" at time t, if there is a non-zero probability that the prevalence, not counting the initial individual or its secondary infections, is non-zero.
In this note, it will be shown that, ifZ(t, l) is the prevalence of new infections (that is, the prevalence without counting the initial case) then if the epidemic is expanded at time t,Z(t, l) is strictly overdispersed. That is var(Z(t, l)) > E(Z(t, l)) or E(Z(t, l + k))ρ(l + k, l)ν(k)Ḡ(k, l) = 0 ∀k ∈ (0, t − l) (S.116) The second condition ensures that, at each k, either the likelihood of a new infection being caused at time l + k, or the probability of an individual who was infected at time l + k causing subsequent infections whose infection tree has non-zero prevalence at time t, is zero. Hence, it is equivalent to the epidemic not being expanded at time t.
It is crucial to useZ(t, l) rather than Z(t, l), as otherwise the deterministic initial case means that, for early times, the prevalence is underdispersed (as, for example E(Z(l, l)) = 1 and var(Z(l, l)) = 0). Moreover, the condition on the tertiary infections is necessary as, otherwise, if N (t, l) is Poissonian, thenZ(t, l) is also Poissonian (and therefore not strictly overdispersed).
It is helpful to derive equations for the quantities for the meanM (t, l) and the varianceṼ (t, l) of the new infection prevalence. This can be done by following the methods of the previous note.
The derivations are mostly identical, and so will not be covered in detail. However, the key point is to note that the equation for the pgf,F , becomes as the factor of s in the first term is discarded. This equation can then be differentiated as before to show that and then rearranged tõ Suppose thatḠ(t − l, l) ̸ = 1. Then, using to split the final term in Supplementary Equation S.123, we find To facilitate the remainder of this proof, it is helpful to define
To prove strict overdispersion, note that, for Supplementary Equation S.140 to hold to equality, it is necessary that all the inequalities used hold to equality. Thus, in particular, it is necessary and hence, as B(l + k) ≥ 1, This means that and hence, asZ(t, l + k)(Z(t, l + k) − 1) is a non-negative integer, this means that almost surely. We now show that if P(Z(t, l) = 1) > 0, then P(Z(t, l) > 1) > 0. This can be done as follows.
Define the set S to be the possible times at which the initial individual can cause a secondary infection which in turn starts an epidemic that can have non-zero prevalence at time t. Then, Then, f must be continuous, and so there exists some y ∈ (0, t − l) such that Thus, there is a non-zero probability of an individual being infected in (l, l +y) causing an epidemic that has non-zero prevalence at time t and, similarly, a non-zero probability of an individual being infected in (l + y, t) causing an epidemic that has non-zero prevalence at time t. Thus, as the infections processes have independent increments and as the initial individual causing an infection in (l + y, t) implies that it must have been infectious for the whole interval (l, l + y), there is a non-zero probability of two such individuals being infected: one in (l, l + y) and one in (l + y, t). is a Poisson case, meaning B * (t) = 1). Asterisks will be used to denote the quantities relating to this Poisson epidemic.
Suppose that the infectious period is the same in both cases (so G = G * and ν = ν * ). To ensure a fair comparison, it is also assumed that the mean number of cases is the same in both cases with Thus, , one can see that this is a renewal equation

(S.159)
An important property of this renewal equation is that the part that is independent of ∆ V on the right hand side grows. That is, Thus, even though these two epidemics give the same mean, the difference in their variances is proportional to the square of this mean. This means that models fitted to a Poisson process framework, even without exponential infectious periods, will substantially underestimate the variance of the number of cases (recalling that E(Y (l + k) 2 ) > 1 in the non-Poisson case).

D.6 Large time solutions to the variance equation
To further understand the variance, we consider large time approximate solutions to the variance equation. Note that the level of rigour in this note is lower than the rest of our derivations as the results are derived for illustrative purposes.
It shall be assumed throughout this note that κ has been absorbed into ρ. Moreover, to enable explicit asymptotic solutions to be found, it shall be assumed that ρ, B and E(Y 2 ) are constants and that g = g(t). Therefore all individuals behave identically (in distribution), irrespective of the time at which they were infected. Moreover, it means that r(l + k, l) = r(k), as the rate of infection depends only on the time since the individual has been infected Under these assumptions, the mean M (t, l) = M (t − l) and the variance V (t, l) = V (t − l) are functions of t − l only. This property will be used when forming the heuristics used in this note.
The final assumption is thatḠ(t) has a finite support -that is,Ḡ(t) = 0 for sufficiently large t.
This is not strictly necessary, but simplifies the analysis.
Then, for t >> l, the mean and variance equations become and and, by the above notes on H, there is a unique value for γ (independent of l) such that this holds.
We shall henceforth assume that γ is equal to this value.
Note that (by considering the case γ = 0) and so the epidemic grows if and only if the expected number of cases caused by an individual is greater than 1, as expected.
The variance equation can now be considered. Note that Hence, the equation for the variance becomes Note that if γ ≤ 0, this variance approximation is not well-defined (as C is either infinite if γ = 0 or negative if γ < 0) and so it is necessary to find another solution. In the γ < 0 case, e γ(t−l) >> e 2γ(t−l) and a leading-order solution can be found simply from Thus, according to these approximations, the variance grows with the square of the mean in the γ > 0 (i.e. growing epidemic) case, while it decreases proportionally to the mean in the γ < 0 (i.e. shrinking epidemic) case. The γ = 0 case is the bifurcation point between these two solutions and would require further analysis.
In the growing epidemic case, the equation for C is also informative in characterising the effect of the different model parameters on the variance. In particular, it shows that there is a linear relationship between E(Y (t) 2 ) and the variance, re-emphasising the point made in the previous subnote that ignoring this parameter can have significant effects on the variance estimate. Moreover, it shows that variance grows rapidly throughout a growing epidemic, remaining proportional to the square of the mean.

D.7 Mean and variance for cumulative incidence
The equations for the mean and prevalence of the cumulative incidence of the epidemic can be derived almost identically, as the two generating functions are very similar. The mean equation gains an term from the additional s being differentiated, which is and hence, the mean equation becomes (using *s to denote cumulative incidence quantities)

Continuous case
If only the cumulative incidence, Z(t, l), is known at some time t, the full epidemic history -in particular, the times at which each individual was infected, and the times at which they stopped being infectious -are unknown. Thus, it is helpful to derive a likelihood function for each possible sequence of these times.
Perhaps the most intuitive approach would be to treat the times at which each individual was infected as continuous random variables. However, the resultant pdf is complicated by the fact that multiple infections are likely to happen simultaneously if E(Y ) > 1, and will have a significant number of Kronecker delta functions to accommodate this, making it complicated both mathematically and practically.
To remedy this, we instead consider three sets of random variables -a vector T of unknown size n + 1, which contains the times of all the infection events up to time t; a vector Y also of size n + 1, which contains the size of each of these infection events (that is, y m is the number of individuals that are infected at time τ m ); and a vector D containing the times at which each individual stops being infected. To make the subsequent notation clearer, we shall use a non-rectangular array X in place of D, where X ij will be the time at which the jth individual infected at time T i stops being infected.
We will suppose that for each s > u and positive integer k and that as the counting process of jumps in N (s, u) is an inhomogeneous Poisson Process of rate r(s, u) (absorbing the κ into r). We can hence create a likelihood function. Define 1 to be a vector of 1's, and choose any vectors τ and d such that each τ i , d j ∈ (0, t). Define dt to be small enough so that τ i − τ j > dt for all i > j and so that |τ i − d j | > dt for all i, j (note that, the set where τ i = d j has zero measure and can be ignored). Moreover, choose a positive-integer-valued vector y. Then, where τ n+1 := t to reduce notation, x ij is the value of X ij in the case D = d and L is a random infectious periods, they can be considered separately. We have Here, the o(dt) term contains three components that can be linearised out of the model -the probability that multiple different individuals contribute to the y k cases (this is O(dt 2 )); the probabilities of individuals infecting no one in this interval (these are independently 1 − O(dt) and hence the O(dt) contribution can be ignored when these probabilities are multiplied together); and the o(dt) terms from the equations defining p k .
As the counting process of jumps in N (s, u) is an inhomogeneous Poisson Process, and it is only "active" for individual ij up to time x ij , where here, the O(dt) term contains the integral between τ k and τ k + dt of each of the integrands.
Taking the products inside the exponential as sums, the various "no infection" terms can be combined together to give Finally, the infectious period terms can be simply calculated from the pdf, g, of L as Hence, combining all the relevant terms,

E.2 Special case (Poisson)
In the Poisson case, A k,i is Poisson distributed with mean ρ(k)ν(k − i). Hence, and so, the more computationally useful log-likelihood is

E.3 Special case (Negative Binomial)
In the Negative Binomial case, where the Y j are iid logarithmic random variables with a pmf given by Supplementary Equation log(ϕµ k + j) + y k log

E.4 Approximating the likelihood
It is difficult to simulate from the likelihoods when the infectious periods of the individuals are unknown because often, Z(t, l) >> t (whereas the other unknowns, τ and y have only n ∼ t parameters). To remedy this, we use an approximation -given an estimate of the function g, we simulate For some D, the observed epidemic may be impossible (e.g. if, D 0 < b 1 , where b 1 is the time that the first infection event occurs). Thus, it necessary to impose a feasibility condition. Many such conditions are possible, but we use a simple condition by defining and then define Given these values, we can then create an approximation, ℓ * to be This clearly creates a non-deterministic likelihood as it is dependent on a set of random variables.
However, from our simulations, it appears that ℓ * has a small variance, and so this extra randomness does not significantly affect our calculations.

F Assessing future variance during an epidemic
Many of the equations presented thus far have been concerned with properties of an epidemic started from a single case at a fixed deterministic time. However, it is crucial to be able to calculate the risk from any time during the epidemic, and such a derivation is presented in this note. This derivation is more algebraically involved than the other work in this paper, and so to reduce its length, it will be assumed that N (t, l) is an inhomogeneous Poisson Process, and that L = ∞ for each individual. This means that y and D can be ignored when considering the likelihood.

F.1 Derivation
Suppose that the prevalence (or, equivalently in this case, cumulative incidence), Z(t, l) = n + 1, is known at some point in an epidemic, but that the times at which these infections happened, B i , are unknown. Note that the notation B i rather than T i is used in this note, because these times are now an exact analogue of birth times in a birth-death process. The condition of n + 1 rather than n has been chosen as this means that there have been n new infections and will make the following derivation notationally simpler.
Note that the infection time of the initial individual, B 0 is known to be equal to l, but it will be treated identically to the other times to reduce notation. Its marginal pdf is f B0 (b) = δ(b − l).
Following the previous note, the pdf f B (b) of the infection times is The first term in this equation can be expanded as Note that E(1 {Bi=b} ) 2 = O(db 2 ) and hence this term has zero measure (as it is only integrated over one dimension). This leaves The second term can also be expanded -note that, by the independence of the Z * terms, for Thus, in all cases This gives an equation of It is more informative to remove the 1 {(b,i)̸ =(c,j)} condition. This can be done by calculating noting that the second term is bounded and contains 1 {b=c} which is non-zero only on a null set of the domain of integration (and hence the integral is zero). Thus, absorbing this correction term into the first term in Supplementary Equation S.220, The advantage of this formulation is that it allows the contributions to the variance from the infection times B i before time t and from further infections between times t and t + s to be separated. Indeed, note that if the infection times are known (so that f noting that the definition of Moreover, Note that, for k ̸ = u, the quantities Z(t + s, t + u) and Z(t + s, t + k) are independent. Moreover, these quantities are all independent from the indicator terms. Thus, it is helpful to split the integral, giving Thus, Hence, one has the final form of the variance equation and (V (t + s, t + u) + M (t + s, t + u) 2 )ρ(t + u)ν bound (u)du := V * (t + s) (S.247) so that this is now independent of b i . Note that the construction of ν bound (u) means that it will still decay for large u. Under the assumption that the infection times are roughly deterministic so the second term is zero, var (Z(t + s, l)) ≤ Z(t, l)V * (t + s) (S.248) The covariance term can be added in by noting that so that it is proportional to Z(t, l), rather than Z(t, l) 2 .
The simplest case is when the infection times are known -something which may be approximately true if the epidemic is large (and hence has been approximately deterministic in the recent past). In this case, the equation simply reduces to var(Z(t + s, l)) ∼ In this discrete setting, it is important to specify exactly inequalities whose strictness is unimportant in the continuous case. In particular, if an individual is infected at time a and has a lifetime of b, it will be considered to be infectious at time a + b, and will be counted when calculating prevalence at this time. That is, it can infect others at time a + b (and these individuals will be given infection time a + b) but will not be able to infect individuals at time a + b + 1.
For the counting process of infections, one can in this case work without a separate infection event process and instead simply use the quantities The form of the generating function for the discrete case is simpler than the continuous one and might be more amenable to computation.

G.2 Recovery of the continuous case
Suppose that each step corresponds to a time interval of dt << 1. Suppose further that g(udt, ldt)dt ∼ g u,l ,t ∼ tdt, andl ∼ ldt (S.266) where the quantities with a hat are constant. To ensure continuity in probability, it will be assumed thatq u (t,l)dt ∼ q u (t, l) ∀u ≥ 1 and q 0 (t, l) ∼ 1 − with an l-independent Y will arise if the ratio of each q k (t, l) and ∞ k=1 q k (t, l) are independent of l.

G.3 Distinctness from the continuous case
It is important to note that the relaxation of the assumption that N is continuous in probability necessary in considering the discrete case means that the pgf becomes materially different.
Indeed, one can characterise the discrete case through the continuous framework by imposing that r(t, l) = as this is gives probability of N increasing (by whatever number) in the discrete case discussed above. Moreover, again allowing Y to depend on l, Y (t, l) has distribution P(Y (t, l) = k) = q k (t, l) for some α and β. Thus, these dissimilarities only appear in the O(dt 2 ) level (and hence disappear in the small dt limit). However, they will be non-trivial if dt is not small, underlining the importance of the assumption that N is continuous in probability -neglecting such an assumption could lead to materially wrong results in the case of a large step-size.

G.4 Discrete likelihood
If the epidemic happens in discrete time, it is significantly easier to calculate the likelihood. Define where each A j k,i is an independent copy of A k,i and, similarly to before, x ij is the time at which the jth individual infected at time i stops being infectious. Note that here, as previously in the discrete setting but in contrast to the continuous case, y i can be zero.
Then, the likelihood is simply given by where, as we are in the discrete case, g is now a pmf. This gives a log-likelihood of ℓ(y, D) = n k=1 log P(A k (y, d) = y k ) + n i=1 yi j=1 log(g(x ij − i, i)) (S.298)