Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020

By October 2021, 230 million SARS-CoV-2 diagnoses have been reported. Yet, a considerable proportion of cases remains undetected. Here, we propose GInPipe, a method that rapidly reconstructs SARS-CoV-2 incidence profiles solely from publicly available, time-stamped viral genomes. We validate GInPipe against simulated outbreaks and elaborate phylodynamic analyses. Using available sequence data, we reconstruct incidence histories for Denmark, Scotland, Switzerland, and Victoria (Australia) and demonstrate, how to use the method to investigate the effects of changing testing policies on case ascertainment. Specifically, we find that under-reporting was highest during summer 2020 in Europe, coinciding with more liberal testing policies at times of low testing capacities. Due to the increased use of real-time sequencing, it is envisaged that GInPipe can complement established surveillance tools to monitor the SARS-CoV-2 pandemic. In post-pandemic times, when diagnostic efforts are decreasing, GInPipe may facilitate the detection of hidden infection dynamics.

The graphic depicts the runtime of executing the entire GInPipe pipeline on a standard laptop (2.3 Ghz, 2 cores) for different data sets.

Supplementary Note 1
The aim of Supplementary Note 1 is to evaluate, based on in silico simulated outbreaks, whether GInPipe can infer the number of infected individuals solely from viral genetic data. Moreover, we want to study the robustness of the method to missing data, unbalanced, time-dependent or biased sampling, introduction of unrelated sequence variants, its ability to reconstruct non-smooth dynamics, as well as its sensitivity to changes in the pathogen mutation rate and selective pressure.

SN.1 In Silico evaluation of GInPipe
As a proof of concept, we set up a simulation of a virtual outbreak: We formulate a stochastic epidemiological process by which a population of infectious agents progress with a time-dependent replication rate. This generates the number of infected individuals at each time step ('ground truth'). In our simulations, each infected individual can contribute to the pandemic and is associated with one viral genome. These genomes can be transmitted and randomly mutated in every time step. The set of genomes constitutes the data source for GInPipe. More precisely, we use the in silico generated viral genomes to compute the incidence correlate φ(t) using GInPipe. The derived φ(t) estimates are then compared with the number of infected individuals N (t) at each time step.

SN.1.1 Summary and Limitations
We thoroughly tested the performance of our method and probed its weaknesses based on the in silico simulations. We found that the method's φ estimates linearly correlate with the infectious population size N true . Different binning strategies, which are used to reconstruct the temporal incidence profiles yield complementary results. However, φ estimates can become inaccurate when the population of infectious individuals is small. 'Outliers' are however eliminated through application of a convolutional filter (smoothing). The φ estimates are unbiased in the case of missing data. However, the method becomes less accurate when too little viral sequences are available and when the "evolutionary signal" vanishes. Time-dependent changes in the sampling proportion have no effect on GInPipe. Evolutionary biased sampling (e.g. over representation of closely related sequences) decreases the evolutionary signal. This could, in principle, be problematic, if drastic changes, e.g. from a representative-to a biased sampling occur. The importation of variants (introductions) that subsequently are community transmitted, do not affect the performance of the method. The same applies to introductions which are not community transmitted. In the extreme case of persistently large frequencies of non-transmitted introductions, however, the slope of the linear correlation function may change. Yet, a linear relationship between the incidence estimate φ and the infectious population size N true is retained and the reconstruction of the pandemic trajectory is still possible. GInPipe is able to reconstruct very steep increases and decreases in the population dynamic. Selective pressure has no effect on GInPipe's incidence reconstructions. With regards to applying GInPipe to other respiratory viruses, low mutation rates decrease the "evolutionary signal", but may be compensated by increasing the bin sizes, at the expense of temporal resolution.

SN.1.2 Outbreak simulation
For the evaluation of the method, we simulate the evolutionary dynamics of a viral outbreak. We simulate a time-discrete (generation-wise) population growth, where each simulation starts at time t 0 and ends at t final = 120 (below, we also evaluate the corresponding continuous-time process). At t 0 , the outbreak starts with N init = 50 identical sequences of length L = 1000. The nucleotides of the initial sequence are chosen randomly. In each of the t final generations, the current population of size N (t) replicates with replication rate ρ and mutates with mutation rate µ. To simulate non-monotonic outbreak trajectories, we chose a sinusoidal rate, depending on time t: ρ(t) = sin(t * 0.11) 15 + 1.02 (SN.1) The number of sequences in the new generation t + 1 is sampled from the Poisson distribution with The sequences for the next generation are chosen randomly with replacement.
The sampled sequences of the new generation are mutated with mutation rate µ = 0.0001. Analogous to the previous step, the total number of positions to be mutated is sampled from a Poisson distribution: n mut ∼ P µ · N (t + 1) · L (SN.3) These mutations are randomly chosen across all sequences and positions without replacement (below, we also simulate selection pressure). Each nucleotide is mutated according to the Kimura substitution type model [1] with fixed rates for transitions (A ↔ G, C ↔ T with α = 0.6) and both types of transversions (A ↔ C, G ↔ T with β = 0.2, as well as A ↔ T , C ↔ G with γ = 0.2). During the course of simulation, each position could in principle be mutated several times.
This procedure results in a sequence set for each generation, which is written into a fasta file. The header of each sequence contains an ID and a date (arbitrary start date + t days) to conform with the format required for the computational pipeline (https://github.com/KleistLab/GInPipe). We ran 10 simulations with the settings stated above. The trajectory of the population dynamics are depicted in Figure SN.5. Moreover, we sub-sampled the the generation-wise data set in different scenarios (section SN.1.7) and created fasta files for the respective scenarios.

SN.1.3 Execution of the method
For each of the generated (sub-)samples we ran the whole pipeline as described in the Methods section (main manuscript). 'Temporal bins' were spanning 2, 4, 6, 8, and 10 days (here days correspond to generations) and 'equal size bins' were created comprising 2, 5, and 7 percent of all sequences. We evaluated the normalised φ b estimates for each bin b according to eq. (1) in the main manuscript. The deviation grows for larger values, because the considered evolutionary model is a Poisson process with error = φ(t) · , where is a zero-centered, symmetric, additive error. In Figure SN.6 (right), we compare the percent deviation from the linear regressionφ(t): · 100, (SN.5) with t b denoting the mean date of all sequences in bin b. The deviation measure allows to center the approximation errors and henceforth analyse how distinct parameters may affect the method's performance. However, below, we also assess differences in the true vs. GInPipe reconstructed effective population sizes R e . We can see that the majority of the point estimates are quite accurately approximated with low percent deviation. Moreover, the estimation seems unbiased (symmetrically dispersed around 0% deviation) and large deviations are rare. Outliers are most prominent when we bin sequences by equal amounts of days, scattering in both directions, hence under-and overestimating. Creating bins of equal size, however, is slightly more prone to underestimating of c · N true .

SN.1.5 Effect of binning strategies
Next, we investigated the sources of outliers. In Figure SN.7, we show violin plots that colour-code the spanned days per bin (left) and the number of sequences per bin (right). Large deviations from the linear fit are particularly occurring for very small infectious population sizes N true , which is indicated by the smaller markers in Fig. SN.7. Moreover, overestimation for the 'equal size' binning is caused by bins spanning large temporal windows, around 20 days (see red dots in SN.7 (left), third violin plot). The outliers that occur for binning with equal amounts of days, to the contrary, are mainly caused by bins containing only very few sequences (red dots in the central violin plot in Figure SN.7 (left)). In these cases, the true population size is already small (small dots) and hence the amount of sequences used to infer the parameter are also low. Therefore, it may happen that the time until the next mutational event lies outside the particular bin, and not enough information is given to infer the (very low) population size. We can conclude, that the choice of the binning strategy is important for accurate results. On the one hand, the bin has to be large enough in order to contain mutational information, but on  the other hand not too large such that the population dynamics can still be temporarily resolved. Moreover, prediction accuracy declines with decreasing population size N true .

SN.1.6 Smoothing corrects for outliers
The point estimates are smoothed using a convolution filter with a window size of 7 (days). To infer statistical uncertainty in φ profiles, we conducted a sub-sampling procedure of the φ point estimates (details in the Methods section, main article).
In Figure SN.8 (left), we show the smoothed reconstruction φ est of the simulated dynamics of N true , which we saw in Figure SN.5. Figure SN.8 (right) also shows, in a representative simulation, that the incidence correlates φ (blue dots and line) yield a proportional estimate of N true (red line) and reflects the overall epidemiological dynamic quite accurately.

SN.1.7 Missing data does not affect incidence reconstructions
In the evaluation above, we have used all sequences that emerge in the evolutionary process. For a more realistic setting, we evaluated the estimation of φ by taking sub-samples of the total sequence set: We sub-sampled a certain fraction of all sequences (WSREL = proportional sub-sampling), and drew the same amount of sequences per generation (WSABS = unbalanced sub-sampling).
For the sub-sampling schemes, we tested the following parameters: • WSREL: fraction of sequences used for inference r sub ∈ {0.9, 0.7, 0.5, 0.3, 0.1} • WSABS: fixed amount of sequences at each generation (up to) a sub ∈ {500, 350, 150, 50} The sequences are randomly chosen without replacement at every time point (below we also evaluate other sampling schemes).
In Figure SN.9, we show violin plots of the percent deviation, akin to Eq. (SN.5) for the raw point estimates (yellow) and the smoothed estimates (green) for different sub-sampling schemes.
We can generally observe that φ is strongly correlated with the population size N true (r ≥ 0.94), even in scenarios with significant under-sampling. Moreover, φ is an unbiased predictor of the incidence (median deviation centered around 0). The deviation from the linear correlation line slightly increases with more severe under-sampling, i.e. the φ estimates are still linearly correlated with N true , but the φ estimates become less accurate. The left, yellow part shows the deviation for the point estimates, the right, green part depicts the deviation for the smoothed estimates. On each side of the violin plot, the r value indicates the respective level of linear relationship of φ and Ntrue. Top: Proportional sub-sampling. Bottom: Absolute sub-sampling.

SN.1.8 Effects of time-dependent sampling rate on incidence reconstruction
In order to further investigate whether there are any effects of the sampling proportion on incidence reconstruction, we assessed the prediction for a quasi-constant population size with different scenarios of changing subsampling over time. For this, we chose the growth rate ρ(t) = 1, i.e. the population of infected individuals will stochastically fluctuate around its initial value, Fig. SN.10, upper left. First, we inspect the incidence reconstructions without temporal changes in sub-sampling. In Fig. SN.10 (upper right), we show the incidence reconstructions for which 100% of the data is available for GInPipe. In Fig. SN.10 (lower left and right), the median reconstructions are shown for sub-sampled data. We observe that the sub-sampling does not affect our estimates drastically. However, the subsampling is not time-dependent in these simulations. Next, we inspect the effect of drastic changes in the sampling rate. For this, we evaluate oscillating changes in the sampling rate, Fig. SN.11 (upper right panels), as well as a sudden increase from a sampling rate of 0.1 to 1.0 Fig. SN.11 (lower left panels) and from 0.5 to 1.0 Fig. SN.11 (lower right panels).
In essence, we observe no systematic effect of temporal changes in the sampling proportion on the incidence reconstructions. This suggests that sudden increases of genomic surveillance of SARS-CoV-2, as observed in many countries after the appearance of the alpha variant (B.1.1.7), do not impact incidence reconstructions using GInPipe. For the extremely low sampling rates in the oscillating case (around generation 45 and 105) we observe inaccuracies and large variance. Therefore, we conclude that temporal changes in the sampling rate do not introduce systematic biases for incidence reconstruction. In line with previous observations (Fig. SN.4), we can however conclude that very low amounts of data increase uncertainty in the incidence reconstructions. Figure SN.11: Incidence reconstructions with steady population size and time-dependent sub-sampling. Incidence reconstruction shown as smoothed median of the φ estimates (solid lines). The coloured shades present the area between the 5th and 95th percentile derived from sub-sampling over the point estimates. The panels below the incidence reconstruction show the number of sampled sequences for the prediction. Top left: Simulated population dynamics (5 simulations). Top Right: Reconstructed incidence history when the sampling proportion is oscillating. Bottom left: Reconstructed incidence dynamics when the sampling proportion abruptly increases from 0.1 to 1.0. Bottom right: Incidence reconstructions when the sampling proportion abruptly changed from 0.5 to 1.0.

SN.1.9 Incidence reconstruction when sequences are sampled by similarity
In the previous sections, we assumed that sequence sampling is representative, i.e. that the sub-sampled sequences are randomly selected. In this section, we evaluate if GInPipe is able to reconstruct incidence histories when sequences are sampled with an similarity-bias. This is supposed to emulate extreme events when sequencing is entirely unbalanced, e.g. there is a large under-detection and detection is driven entirely by contact tracing. For example, after an infected individual is detected, any forward and backward contacts are investigated and positive cases are sequenced. At the same time, there would be very little diagnosis without a known contact. The sequenced samples would then be highly similar to the sequence from the first detected case, if they are linked by transmission events. For the sequence sub-sampling process, instead of choosing the sequences randomly with the same probability, we assign a similarity value d t (s) for each sequence s to the previous generation, which is used as a probability measure for being sampled in the next generation. The similarity measure is given by the hamming distance (number of differing nucleotides) to the preceding sequence. If the previous generation contains various sequence types, the minimum distance is chosen: with S t being the sequence population at time t, and S t−1 the sequence population in the preceding generation t − 1. We chose to half the chance of being sampled with every nucleotide difference. The probability p t of each sequence s i to be chosen at time t is given by with D = j∈St 1 2 d t (s j ) being a normalising factor. Figure SN.12 (top left) shows the simulated population dynamics. Fig. SN.12 (top center) shows the respective incidence reconstructions without sub-sampling (full data set), i.e. the sequences are not sampled by similarity yet. As expected, the incidence estimations with the full data set yields good results (r = 0.98 for the smoothed estimates). When the sequence set is sub-sampled by similarity as described above, we can see a scaling effect, Fig. SN.12 (upper right). I.e., the smaller the sequence set (in terms of the proportion of the entire sequence set), the lower the φ values. Interestingly, the profiles are maintained Fig. SN.12 (upper right, Pearson correlations r ≥ 0.95), while the slope between the populations size and the smoothed incidence estimate decreases with a decreasing proportions of sub-sampled sequences. Therefore, although a strong linear relationship can be maintained with proportional sub-sampling, the scaling factor is affected, Fig. SN.12 (top right and bottom). This observation is explained by a decrease in the 'evolutionary signal' when sequences are sampled by similarity. This effect is most pronounced if the sampled sequence set becomes small.
Sampling non-proportional subsets over time, i.e. taking the same amount of sequences independent of the population size at each generation (50-500 sequences), leads to a irregular rescaling of the underlying dynamic. The linear relationship of the incidence correlate and the true population size is debilitated (Pearson correlation ranging from 0.85 to 0.95 for the smoothed φ estimates). This effect is depicted in Fig. SN.13, where the smoothed φ estimates for different levels of similarity-biased sub-sampling for one of the simulations are shown. On the left, the different shades of purple indicate the proportion of sub-sampling. As can be seen, the scaling factor decreases with the amount of sequences, but the dynamic can be retained. However, when non-proportional amounts of sequences are drawn, indicated by the shades of green in the right panel, the sub-sampling-driven scaling effect is uneven over time and the dynamic of the incidence correlate can become distorted. For the case of 500 and 350 sequences per generation, the estimates reach similar values for the first wave (around t = 30) compared with the entire data set Figure SN.12: Incidence reconstruction when sequences are sampled by similarity. Top left: Simulated population dynamics. Top center: The solid lines denote the smoothed median of the incidence reconstructions with GInPipe using all available genetic data (no biased sub-sampling). The coloured shades present the area between the 5th and 95th percentile. Top right: Smoothed median of the re-sampled φ estimates with relative sub-sampling biased by similarity. Bottom Population size Ntrue vs. incidence correlate φ for different sub-samplings from the top right panel.
(the black dashed line). However, the second wave (around t = 90) is down-scaled and does not reflect the different magnitudes of the two waves. In conclusion, when sequences are sampled by similarity, the 'evolutionary signal' decreases and the scaling between the true population size and the incidence estimate decreases. If this biased sampling is not changing through time, incidence histories can still be reconstructed. If the sampling strategy would change over the time course of the pandemic (e.g. from biased to representative sampling), difficulties with GInPipe-based incidence reconstruction may arise. Most other methods (e.g. phylodynamics) are likely to encounter similar difficulties when the data is systematically perturbed. For GInPipe, and likewise for any other method that utilizes pathogen genome data, a representative sampling of the population is important to guarantee that meaningful results are obtained. Figure SN.13: Incidence reconstruction with similarity-biased sampling for different levels of sub-sampling for one of the simulations. The shades indicate the level of sub-sampling. The black dashed line shows the estimates with all data. Left: Proportional sub-sampling. Right: Absolute sub-sampling.

SN.1.10 Incidence reconstructions for a continuous-time population dynamic
In the previous sections, we used a minimalistic approach for simulating the outbreak dynamics. In particular, we performed a time-discretisation, i.e. for each discrete time t ∈ (0, 1, ..., 120), the population size was estimated by sampling from a Poisson distribution, N (t + 1) ∼ P N (t) · ρ(t) .
Here, to verify that the time discretisation has absolutely no effect on the in silico assessment of the GInPipe method, we also provide simulations using the time-continuous birth-death process. We introduce the two reactions: where R 1 denotes the 'birth' reaction, where an infected individual infects another individual. Reaction R 2 denotes the 'death' reaction, where an infected individual becomes non-infectious. The reaction rates of the model above are given as r 1 (t) = β(t) · I(t) and r 2 (t) = I(t) · δ. Furthermore, we set δ = 0.5 (1/day) and then derive β(t) = ρ(t) · δ, where ρ(t) is given in eq. (SN.1). We then simulate the time-continuous model using the Gillespie algorithm, which draws the time ∆t to the next event (firing of reaction R 1 or R 2 ) from an exponential distribution. After each population dynamic event, we update the number of sequences (either cloning or deleting one sequence) and place n mut ∼ P µ · N (t + ∆t) · L · ∆t mutations onto the set of sequences.
After simulating a few outbreaks, Fig. SN.14 (left), we use GInPipe to reconstruct the incidence dynamics using the standard setting, as shown in Fig. SN.14 (middle and right). As can be seen, the choice of modelling approach (discrete time vs. continuous time) has no effect of GInPipe's ability to reconstruct incidence dynamics. While the population dynamics may also be modelled by more complex S(E)IR-type models, the model above (eq. (SN.8)) can be easily deduced from such models by assuming that a sufficient number of susceptibles is available. However, the choice of structual model does not affect any of the analyses made in Supplementary Note 1. Figure SN.14: Simulations with birth-death-process. Left: True population size. Middle: The solid lines denote the smoothed median of the φ estimates for the stochastic simulations. The shaded areas denote the 90% confidence interval. Right: An example simulation of the epidemic (red curve) vs. the reconstructed incidence history.

SN.1.11 Comparison of R e estimates
GInPipe allows to infer the effective reproduction number R e , based on the Wallinga-Teunis method, as described in the Methods section in the main manuscript. In the Results section, we assess the accuracy of GInPipe by comparing the R e values inferred from smoothed φ estimates with R e values calculated from the simulated pandemic N true . In Figure SN.15, we see the individual log(R e ) profiles for the 10 simulations from Section SN.1.2 inferred with N true (bold cyan lines) and the according smoothed φ estimates (red lines). The identity plot in Fig. 1 in the main manuscript shows the log(R e ) values inferred with N true vs. φ, with the respective proportion of qualitatively agreeing or disagreeing predictions in the four quadrants. The accuracy is given by the proportion of qualitatively agreeing R e values, yielding a value of 0.92 using the complete data set. Figure SN.15: Individual Re profiles. Individual Re profiles for 10 simulations estimated from the true population sizes (thick cyan lines) and from φ estimates inferred with GInPipe (thin red lines).
Complementing the evaluations in the main manuscript, we additionally assess the accuracy for the sub-sampled data ( Fig. SN.16). Here, we also observe good results, achieving accuracies between 0.89 and 0.92. In the case of very low sub-sampling proportions (10%), however, the accuracy declines to 78%, due to noisy ("wiggly") dynamics.

SN.1.12 Introductions of new sequences do not affect prediction accuracy
Above, we evaluated an isolated outbreak, where the diversity of mutants is emerging solely through evolution, i.e. reproduction and mutation. However, in case of a widespread epidemic, several outbreaks at different locations could lead to populations which evolve in parallel, causing different strains to emerge. Foreign strains may then be introduced into a local epidemic at any point in time.
To test whether the introduction of foreign strains affects the proposed method, we added x i ∈ {50, 100, 500, 1000} 'introduction sequences' at random time points to the initial outbreak. In all simulations, the 'introduction sequences' have 3% of the genome randomly mutated with regards to the founder sequence of the local outbreak. We performed 5 simulations that began with 1-10 start sequences. All sequences were considered to evolve with the same replication and mutation rates. For a representative simulation, Figure SN.17 (left) shows the simulated epidemiological dynamics for different numbers of introductions. On the right, we see the corresponding correlations of the incidence estimate φ with the true population size N true . The inferred φ estimates show a strong linear correlation with N true across all simulated scenarios (r = 0.99, p < 10 −16 ). In Figure SN.18 we evaluate how the relative amount of introductions affects the percent deviation, akin to Eq. (SN.5). We can see that for the majority of the simulations the deviation is around zero with no apparent trend of increasing dispersion (prediction inaccuracy), when large numbers of foreign variants become introduced. In summary, the predictions are not perturbed by introductions of foreign sequences. Potential biases from single introductions are compensated by ongoing local evolution (community transmission) of the imported sequences.
To investigate the effect of introductions without the compensating effect of community transmissions, we simulated an extreme scenario, where the introduced sequences are only added to the local outbreak and do not replicate or mutate. They simply occur as additional haplotypes in the population. In reality this would be the case, if infected people from a different region get tested, The corresponding φ estimates in Figure SN.19 (top right) show the following: In simulations, where 50 or 100 variants where introduced, the corresponding φ estimate remain quantitatively unchanged in comparison to a 'zero-introduction' simulation. In the extreme case, where 500, and even 1000 variants were introduced, (pink and green lines in Figure SN.19, top right), the dynamical profile of the number of infected individuals is still very closely captured (highlighting the 2.5 waves of infection). However, quantitatively the φ estimates are slightly increased in comparison to the 'zero-introduction' simulation. Thus, if one would like to compare a setting, where there are many non-replicating introductions, to a setting with replicating-or 'zero-' introductions, the quantitative interpretation of the φ estimates with regards to the true number of infected individuals N true might differ to some extend. There may be a slightly different slope for the linear correlation function of φ vs. N true . In Figure SN.19 (bottom left), we systematically assess whether large relative amounts of nonreplicating introductions decrease prediction accuracy, i.e. increase deviation from the linear regression function for the specific simulation. We do not discover a systematic trend, even if there were more non-replicating introductions than regionally transmitting viruses. We monitor the relative amount of introductions over time for the corresponding smoothed φ estimates in the panel above (Fig. SN.19, bottom right). Looking at the ratios of introductions vs. the effective population size per generation for the case of 1000 introductions (green line), we observe that the method is still able to reconstruct the dynamic and attain a linear relationship, despite having consecutively more than 15% introductions during the first wave and partly almost reaching the number of evolving sequences.

SN.1.13 Reconstruction of extreme (un-smooth) dynamics
To further test the limits of GInPipe, we set up two experiments with population dynamics that resemble step-functions. Obviously, these dynamics are quite unlikely to happen in reality and are therefore considered as extreme cases to push the pipeline to the limit. In the first experiment, the population of infected individuals stochastically fluctuates around its initial value of ≈ 2000 for 60 generations, after which the population suddenly decreases by a factor 10 (N (t = 61) = P(N (t = 60) · 0.1), Fig. SN.20) and henceforth remains around around a value of ≈ 200 for the remaining time points. In the second experiment, the population of infected individuals stochastically fluctuates around its initial value of ≈ 100 for 60 generations, after which it increases by a factor five (N (t = 61) = P(N (t = 60) · 5), Fig. SN.21). Notably, while GInPipe generally reconstructs the population dynamics quite well, we do observe a "smearing out" when the population size changes drastically at generation 60,Fig. SN.20 (upper left vs. upper right). This "smearing out" is likely due to the binning of temporally adjacent sequences, and may be overcome by defining bins that do not span the region of abrupt change (if the latter was known). Likewise, for the case of the step-function that increases the population size, Fig. SN.21 (upper left), we observe reasonable good incidence reconstructions, Fig. SN.21 (upper right). However, while many reconstructed incidence histories map the population dynamics reliably (example in Fig. SN.21, lower left), we can also observe more drastic delays in some simulations Fig. SN.21, lower right). This effect appears to be stronger for the step function increasing the populations size ( Fig. SN.21), than for the step function decreasing the population size ( Fig. SN.20). We believe that this effect is mainly due to a sampling bias with regards to the number of new mutations (i.e. in some simulations the new mutations appear later than the population size increases). In fact, such extreme dynamics could in theory appear if a single super-spreading event is responsible for the vast majority of new infections. In view of the spatial scales we analysed (countries or federal states) it is however unlikely that a single event dominates the pandemic. Our assessments of GInPipe for non-smooth population dynamics (step-wise changes of the number of infected individuals) revealed that the pipeline, with default setting may "smear out" abrupt changes in the number of infected individuals. For sudden changes that increase the number of infected individuals, temporal delays may be encountered. However, we have to note that this technical test may have limited relevance to a real-world application where the population of infected individuals changes rather smoothly over time. Figure SN.20: Simulations for a drastic decline of infected individuals. Top left: Simulated pandemic trajectories (5 simulations) starting with 3000 infected individuals that drop by a factor of ≈ 10 after 60 generations. Top right: The solid lines denote the smoothed median of the φ estimates for the respective stochastic simulations. The shaded areas denote the 90% confidence interval. Bottom left and right: Two individual simulations (red lines) and the corresponding incidence reconstructions (blue dots, lines (median) and shaded areas (90% confidence interval)) Figure SN.21: Simulations for a drastic increase of infected individuals. Top left: Simulated pandemic trajectories (5 simulations) starting with 100 infected individuals that increase by a factor of ≈ 5 after 60 generations. Top right: The solid lines denote the smoothed median of the φ estimates for the respective stochastic simulations. The shaded areas denote the 90% confidence interval. Bottom left and right: Two individual simulations (red lines) and the corresponding incidence reconstructions (blue dots, lines (median) and shaded areas (90% confidence interval)) SN.1.14 Effect of selective pressure on incidence reconstruction In next scenario, we wanted to analyse if selection pressure affects GInPipe's ability to reconstruct the incidence history. To this end, we assign a fitness value f (i, m i ) to each mutation m i at position i. We then compute the fitness of a sequence f s as the product of the fitness of all mutations contained in that sequence. Sequences s are then chosen to replicate based on their fitness, i.e. with probability f s / r f r . The mutations can either be beneficial (fitter than the wild type) or deleterious (lower survival chance than the wild type). Individual fitness values were drawn from a log-normal distribution. To assess different levels of stochasticity and yet derive comparable simulations, we considered ∈ N f ⊂ L positions to affect fitness (with f ( , m ) = 1). The remaining positions k ∈ L \ N f were considered to be neutral (f (k, m k ) = 1 = wild type fitness). We chose the single fitness values according to The fitness of a sequence s is given by the product the position-wise fitness values f (i, m i ) in the respective nucleotide state m i .
Our total variance, which denotes the convolution of all mutation-wise variances, was set to σ 2 tot = 0.2. This setting was chosen, such that the overall change in fitness remains within a realistic range (e.g. the fitness advantage of the SARS-CoV-2 Delta variant over the wild type is assumed to be ≈ 2). Consequently, the variances of the individual fitness values f ( , m ) were set to σ 2 = σ 2 tot N f . For illustration purposes, Figure SN.22 shows the population average fitness for 10 simulations each, using the evolutionary setting as above, with N f = {5, 10, 50, 100, 500} non-neutral sites. With the choice of a normalised variance, smaller numbers of non-neutral sites correspond to larger fitness changes per site, making the simulations with e.g. N f = 5 more stochastic. On the contrary, with N f = 500, each non-neutral mutation confers very small changes with regards to fitness, making the evolutionary adaptation more transient. The plot shows that the composition of the sequence set is driven by selective pressure: I.e. the sequence composition is changed such that the average fitness increases over time. Fig. SN.23(top left) shows the simulated population dynamics using the evolutionary setting with N f = 50 non-neutral sites. Fig. SN.23(top right) shows the GInPipe reconstructed incidence histories and Fig. SN.23(lower panels) shows the reconstructed incidence histories with different sub-sampling schemes (relative vs. absolute sub-sampling). As can be seen in Fig. SN.23, GInPipe can reconstruct incidence histories very well when particular positions are under selective pressure.

SN.1.15 Effect of mutation rate on incidence reconstruction
In this section we assess how changes in the mutation rate affect the incidence reconstruction with GInPipe. Note that a severely lowered mutation rate would decrease the evolutionary information contained in any sequence set and increase stochasticity, by turning evolutionary events into rare events. We performed simulations with the same setting as in section SN.1.3, but this time increased or decreased the mutation rate 10-fold.
10-fold lowed mutation rate. Simulations using a low mutation rate of µ = 0.00001 are depicted in Fig. SN.24 (upper row, left most panel). Reconstructed incidence dynamics using GInPipe with default settings are shown in Fig. SN.24 using the full data set (upper row, center left panel), as well as sub-sampled data sets (upper row, the two panels on the right). For the full data set, GInPipe can still reconstruct the dynamics (Pearson correlation of r = 0.9), but for the sub-sampled data sets the reconstructions worsen (Pearson correlation coefficients of r = 0.7−0.9). Figure SN.24: Incidence reconstructions with a 10-fold lowered mutation rate (µ = 0.00001). Top row: Original binning strategy. Bottom row: Adjusted binning strategy. Left: Simulated pandemic trajectories (5 simulations). Center left: The solid lines denote the smoothed median of the φ estimates for the respective stochastic simulations using all sequences. The shaded areas denote the 90% confidence interval. Center right and rightmost panel: Smoothed median of the φ estimates for the respective stochastic simulations using subsets of the sequences (center right: relative subsampling, far right: uneven subsampling). The transparency of the curve indicates the level of subsampling.
Work-around: Increasing bin sizes. As mentioned above, for very low mutation rates, the binning strategy may produce sequence sets that contain no evolutionary information. Consequently, GInPipe may not robustly infer incidence estimates on these data sets. However, increasing the bin sizes may revert this situation. To demonstrate this presumption, we changed the binning strategy (compare to Methods section in the main manuscript) such that the bin sizes increase. In particular, we set: (i) the days per bin to {10, 15, 20, 25, 30}, (ii) the sequences per bin to {5, 10, 15, 20} percent of the entire sequence set and adjusted the (iii) constraints, such that each bin must span at least 5 days and not more than 30 days and contain at least 20 sequences. As can be seen in Fig. SN.24, this helped to recover the original dynamics (Pearson correlation of r = 0.95 (full data set), and 0.87-0.94 for the sub-sampled data sets), however at the cost of temporal resolution (e.g. the final increase in incidence cannot be recovered). A comparison of the original binning strategy with the adjusted binning strategy (larger bins) is shown in Fig. SN.25 for one simulation. As can be seen therein (top left panel vs. bottom left panel), the original binning strategy has a much finer time resolution (number of blue dots).
As mentioned before, lowering the mutation rate increases stochasticity, and consequently decreases the signal-to-noise ratio. When sub-sampling the data using the original binning strategy Fig. SN.25 (top center and right panel), this can lead to scenarios where certain bins produce outlier dynamics (large shaded areas). When increasing the bin sizes Fig. SN.25 (bottom center and right panel), these large fluctuations due to extreme outliers could be eliminated. Figure SN.25: Example of incidence reconstructions for one simulation with a 10-fold decreased mutation rate (µ = 0.00001) using different binning strategies. Top row: Original binning strategy as stated in Section SN.1.3. Bottom row: Adapted binning strategy (larger bins). Left: Simulated population dynamics (red line) and GInPipe reconstructed incidence dynamics using all available data (blue dots: φ estimates, blue line: smoothed estimate). Center: Incidence reconstructions using relative sub-sampling. The transparency of the lines indicate the level of sub-sampling. Right: Incidence reconstructions using absolute sub-sampling. The transparency of the lines indicate the level of sub-sampling.
10-fold increased mutation rate. Simulations with a 10-fold increased mutation rate of µ = 0.001 are shown in Fig. SN.26 (left) together with GInPipe's incidence reconstructions using the full data set (center left), as well as sub-sampled data sets (the two panels on the right). For a large mutation rate GInPipe can reconstruct the incidence very well (Pearson correlation of r = 0.996). For relative sub-sampling the incidence reconstructions are similarly good (r > 0.95),

Observations and Outlook
In summary, our simulations show that, particularly for low mutation rates, the 'evolutionary signal' decreases and that the emergence of novel haplotypes becomes more stochastic. In essence, this means that the signal-to-noise ratio decreases. The loss in evolutionary information can be circumvented by increasing the bin sizes, at the cost of temporal resolution. While the binning strategy presented in the Methods is well adapted to SARS-CoV-2, application of GInPipe to other respiratory infections would likely require to adapt the binning strategy to the data. As an outlook, we plan to build filters that automatically adapt binning strategies to the amount of Figure SN.26: Incidence reconstructions with a 10-fold increased mutation rate (µ = 0.001). Leftmost panel: Simulated pandemic trajectories (5 simulations). Center left: The solid lines denote the smoothed median incidence reconstructions using all sequences. The shaded areas denote the 90% confidence interval. Center right and rightmost panel: Smoothed median incidence reconstructions using subsets of the the data (center right: relative sub-sampling; right most: absolute sub-sampling). The transparency of the curve indicates the level of sub-sampling. evolutionary information in a given data set, and which mark temporal periods where insufficient information is available.

Supplementary Note 2
In this Supplementary Note all parameters used in the phylodynamic inference are stated, such as (i) change dates where major non-pharmaceutical measures were introduced or lifted (these intervals are used to calculate the piece-wise constant R e (τ ) values); (ii) All final parameter estimates, as well as (iii) 'lineage-through-time' plots for phylodynamic reconstruction.   Figure SN.28: Scotland: Top: Pango lineages identified in the full (left) and subsampled (right) data set with the coloured area being proportional to the size of the specific lineage. Values in brackets next to the lineage name in the legend represent the number of sequences identified in the full (left) and subsampled (right) data set. Bottom: Lineage-through-time plot, counting the number of phylogenetic tree branches over time, of maximum clade credibility tree for all identified approximate clusters. Pango lineages that were split due to a longer period without any sample are denoted numerically using a dash separator, e.g. B.1-0 and B.1-1. Vertical dotted lines represent the dates at which birth, death and sampling rate were allowed to change. The number of sequences included for each approximate cluster in the analysis are given in brackets next to the cluster name in the legend. Switzerland Figure SN.29: Switzerland: Top: Pango lineages identified in the full (left) and subsampled (right) data set with the coloured area being proportional to the size of the specific lineage. Values in brackets next to the lineage name in the legend represent the number of sequences identified in the full (left) and subsampled (right) data set. Bottom: Lineage-through-time plot, counting the number of phylogenetic tree branches over time, of maximum clade credibility tree for all identified approximate clusters. Pango lineages that were split due to a longer period without any sample are denoted numerically using a dash separator, e.g. B-0 and B-1. Vertical dotted lines represent the dates at which birth, death and sampling rate were allowed to change. The number of sequences included for each approximate cluster in the analysis are given in brackets next to the cluster name in the legend.  Figure SN.30: Victoria, Australia: Top: Pango lineages identified in the full (left) and subsampled (right) data set with the coloured area being proportional to the size of the specific lineage. Values in brackets next to the lineage name in the legend represent the number of sequences identified in the full (left) and subsampled (right) data set. Due to the lineage-specific subsampling procedure, the relative lineage compostion is different between the full and the subsampled dataset. Bottom: Lineage-throughtime plot, counting the number of phylogenetic tree branches over time, of maximum clade credibility tree for all identified approximate clusters. Vertical dotted lines represent the dates at which birth, death and sampling rate were allowed to change. The number of sequences included for each approximate cluster in the analysis are given in brackets next to the cluster name in the legend.

Supplementary Note 3
This Supplementary Note contains all weblinks that point to the primary data depicted or discussed in the main manuscript. For example, the data related to (i) changes in non-pharmaceutical interventions, which are used in BEAST2; (ii) case reporting data which is depicted in Fig. 3 (main article); (iii) the number of tests performed and the proportion of positive tests, which are used to calculate the proportion of detected cases; and (iv) changes in testing strategies, which are depicted in Fig. 4 (main article).

Supplementary Links Changes in non-pharmaceutical interventions
The data sources stated herein are used to determine change dates for phylodynamic reconstruction. The latter is used to compute R e values that are depicted in Fig. 2

Case reports
Reported SARS-CoV-2 infections are depicted in Fig. 3

Number of tests performed and the test positivity
This data is used in Fig. 4

Testing Strategy
Data on major changes in testing strategies are depicted in Fig. 4  We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically.

Scotland
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically.

Switzerland
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.

Victoria/AU
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically.

Japan
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.

Brazil
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically.

India
We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based.
All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically.