Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework

Global and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, Rt, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and Rt. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of Rt are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and Rt that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease.

T he spread of the new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the ensuing outbreaks of coronavirus disease 2019 (COVID- 19) have placed a substantial burden on public health in the United Kingdom. As of 14 July 2021, the number of people recorded to have died in the United Kingdom within 28 days of a positive SARS-CoV-2 test was 128,530 (refs. 1,2 ). In response to the ongoing epidemic, the UK government has implemented a number of non-pharmaceutical interventions to reduce the transmission of SARS-CoV-2, ranging from localized measures, such as the closures of bars and restaurants, to full national lockdowns 3 . The localized measures have been employed through a regional tier system, with lower tier local authorities (LTLAs) being placed under varying levels of restrictions according to data such as the number of positive polymerase chain reaction (PCR) tests returned there over a 7-day interval (or local weekly positive tests) 4 . Following a third national lockdown that began on the 6 January 2021, the United Kingdom has undergone a staged relaxation of restrictions, with lockdown rules ending on 19 July 2021 (ref. 5 ).
In the United Kingdom, there are two major ongoing studies that undertake randomized survey testing to provide an insight into the prevalence of SARS-CoV-2. Since April 2020, the Office for National Statistics (ONS) COVID-19 Infection Survey (CIS) tests a random sample of people living in the community with longitudinal follow-up 6 . The survey is designed to be representative of the UK population, with individuals aged two years and over in private households randomly selected from address lists and previous ONS surveys, although it does not explicitly cover care homes, the sheltering population, student halls or individuals currently being hospitalized. The REal-time Assessment of Community Transmission (REACT) study is a second nationally representative prevalence survey of SARS-CoV-2 based on repeated cross-sectional samples from a representative subpopulation defined via (stratified) random sampling from the National Health Service patient register of England 7,8 . Importantly, both surveys recruit participants regardless of symptom status and are therefore able to largely avoid issues arising from ascertainment bias when estimating prevalence. The ONS CIS uses multilevel regression and post-stratification to account for any residual ascertainment effects due to non-response 6 , whereas the REACT study uses survey weights for this purpose.
While randomized surveillance testing readily provides an accurate statistical estimate of prevalence of PCR positivity, precision can be low at finer spatiotemporal scales (for example, at the LTLA level), even in large studies such as the ONS CIS and REACT surveys. Our major goal here is to unlock the information in non-randomized testing under arbitrary, unknown ascertainment bias. Although we expect the methods to apply in a broad manner, here we focus on Pillar 1 and Pillar 2 (Pillar 1+2) PCR tests conducted in England between 31 May 2020 and 20 June 2021 (lateral flow device (LFD) tests are not included; further details provided in Methods and Data availability). Pillar 1 tests refer to "all swab tests performed in Public Health England (PHE) labs and National Health Service (NHS) hospitals for those with a clinical need, and health and care workers" 9 , and Pillar 2 tests comprise "swab testing for the wider population" 9 . Pillar 1+2 testing therefore has more capacity than the randomized programmes, but the protocol incurs ascertainment bias because those at increased risk of being infected are tested, such as frontline workers, contacts traced to a COVID-19 case or the subpopulation presenting with COVID-19 symptoms, such as loss of taste and smell 9 . Hence, raw prevalence estimates from Pillar 1+2 data (as a proportion of tested population) will tend to be biased upwards and cannot directly be used to estimate the unknown infection rate in a region. In contrast, as a proportion of the entire population, the bias is downwards as not all individuals with infection in the area are captured. Furthermore, the degree of upward bias may be influenced by overall testing capacity and uptake. In addition, the raw prevalence estimates tend not to capture asymptomatic infection, even though there is evidence to indicate that asymptomatic individuals can contribute to viral transmission 10,11 .
Combining data from multiple surveillance schemes can improve estimates for prevalence. For example, Manzi et al. 12 incorporated information from multiple, biased, commercial surveys to provide more accurate and precise estimates of smoking prevalence in local authorities across the East of England. A number of geostatistical frameworks for infectious disease modelling based on multiple diagnostic tests have been developed [13][14][15] . These accommodate different sources of heterogeneity among the tests to deliver more reliable and precise inferences on disease prevalence.
To understand the ascertainment bias problem and to enable a statistical approach to correction, it is helpful to consider a simplified causal model 16,17 for Pillar 1+2 data. This is represented by a directed acyclic graph (DAG), shown in Fig. 1a, that charts the dependencies of an individual from infection status to test result. The circles indicate the binary (yes/no) states of an individual. The DAG characterizes the joint distribution of the major factors leading to the observed data. Throughout the paper, we use the term 'targeted testing data' to refer to data gathered under some ascertainment process distinct from (stratified) random sampling, with an exemplar being selection for testing of the subpopulation with COVID-19 symptoms, which comprises a sizeable proportion of Pillar 1+2 tests. There are several other potential confounders, exemplified in Fig. 1a by socioeconomic status (SES), which is a well-studied factor of both infection risk and access to healthcare and/or testing. The DAG explicitly characterizes statistically why we cannot directly use Pillar 1+2 data. The DAG also points to a potential solution that we pursue here: if the statistical dependencies as indicated by the arrows in Fig. 1a can be modelled, then we can correct for the ascertainment bias in Pillar 1+2 data.
In addition to prevalence, there are a number of epidemiological parameters that may be useful for informing localized non-pharmaceutical interventions. For example, one particular variable of interest is the (time-varying) effective reproductive number R t , which is defined roughly as the average number of infections caused by an infectious individual. That is, when R t > 1, the epidemic will continue to spread. The current pandemic has spurred the development of models that aim to incorporate multiple sources of data to estimate important epidemiological parameters. See Supplementary Table 1 for an overview of the methodological work most related to ours [18][19][20][21][22][23][24][25] (https://localcovid.info/), including a brief description of each method and what the data inputs and results outputted are; we also recommend refs. 26,27 for reviews, which have a particular focus on R t .
Within this urgent and fast developing area of research, it is clearly important to define the aspects in which our method contributes. First, we have developed methods to infer unbiased local prevalence, I t , from targeted testing data. This is important in its own right because being able to estimate local prevalence accurately from targeted testing data adds an important facet to existing COVID-19 monitoring capabilities. Here, we focus on weekly period prevalence and explicitly target the number of infectious individuals via a correction to the estimated PCR-positive numbers. Second, our method outputs bias-adjusted cross-sectional prevalence likelihoods p(n t of N t | I t ), where n t and N t are positive and total targeted test counts, respectively. This allows prevalence information from targeted data to be coherently embedded in a modular way into complex spatiotemporal epidemiological models, including those synthesizing multiple data types. We exemplify this by implementing a susceptible-infectious-recovered (SIR) model around our ascertainment model likelihood. Third, our local ascertainment model is based on targeted testing data alone with both the number of positive and total tests being modelled (n t and N t ). This has two important benefits: spatiotemporal variation in testing uptake and capacity is explicitly conditioned on (via N t ), and differential test specificity and sensitivity can be naturally incorporated into our causal ascertainment model.

Results
Correcting for ascertainment bias in targeted testing data. Figure 2a-c displays the percentage of positive Pillar 1+2 tests (as a proportion of those tested) against accurate prevalence estimates from the REACT study, which shows a clear upward bias (each point corresponds to a single LTLA). Here, we introduce a bias-correction method that aims to provide accurate estimates of prevalence at the local level, as displayed in Fig. 2d-f, based on the posterior cross-sectional prevalence p(I t | n t of N t ).
With reference to the causal DAG in Fig. 1a, we define the essential bias parameter, δ, as that is, the log odds-ratio of being tested in the infected subpopulation versus in the non-infected subpopulation. Larger values of δ generally correspond to higher levels of ascertainment bias; that is, a higher chance of an individual with an infection being selected for testing relative to an individual without infection. Our approach combines randomized surveillance data (REACT) and targeted surveillance data (Pillars 1+2) to infer δ at the coarse geographical level (PHE region; Fig. 1b). We then take forward this information by specifying a temporally smooth empirical Bayes (EB) prior on δ 1:T , applied to each constituent local region (LTLA) in the local prevalence analyses. Figure 3a shows the resulting EB priors on δ. There is potentially more variation in δ across regions early and late in the sampling period (before September 2020 and after March 2021), although the prior credible intervals are broad and often overlapping. The data provide more information on δ between October 2020 and February 2021. . Horizontal grey lines are 95% exact binomial confidence intervals from the REACT data. The number of independent tests underlying each mean and (horizontal) credible intervals for the REACT data varied between 248 and 2,387. Vertical black lines in a-c are 95% exact binomial confidence intervals for from the raw, non-debiased Pillar 1+2 data. Vertical black lines in d-f are 95% posterior credible intervals from the debiased Pillar 1+2 data. The number of independent tests underlying each mean and (vertical) credible interval for the Pillar 1+2 data varied between 1,117 and 42,458. Neither set of prevalence estimates has been corrected for false positives or negatives. Note that in d-f, the credible interval widths are systematically tighter for the debiased Pillar 1+2 compared with the REACT data, which highlights the useful information content in debiased Pillar 1+2 data.

Debiased likelihood for modular sharing of prevalence information.
Equipped with a coarse-scale (PHE-region level) EB prior on bias δ, we evaluated a fine-scale (LTLA-level) δ-marginalized likelihood of the form p(nt of Nt|It,νt) as described in equation (17) in the Methods ("Cross-sectional inference on local prevalence"). This debiased prevalence likelihood can be readily exported and modularly incorporated into more complex models, as we illustrate below ("Longitudinal local prevalence and transmission").
Cross-sectional prevalence posterior. The δ-marginalized likelihood can be inputted directly into cross-sectional Bayesian inference, outputting the prevalence posterior p(It|nt of Nt,νt) for each time point at which such count data are available. Figure 3b plots these cross-sectional prevalence posteriors beneath the raw counts for a subset of LTLAs across the nine PHE regions. REACT sampling periods are plotted at the base of each panel, and local prevalence estimates from REACT round 7 (November 2020) and round 8 (January 2021) are also superimposed. The corrected cross-sectional prevalence estimates are consistent with the gold-standard REACT estimates, but are more precise, as expected from Bayesian principles of data synthesis.
Longitudinal local prevalence and transmission. The crosssectional debiased likelihood can be introduced modularly into a wide variety of downstream epidemiological models. We illustrate this by using the likelihood as an input to a simple SIR epidemic model (Methods, "Full Bayesian inference under a stochastic SIR epidemic model", and Extended Data Fig. 1). Figure 4a  The thick curves show the prior means and the narrow curves show 95% credible intervals. Note that δ is the log odds-ratio, so, for example, δ = 3 implies that the odds of being tested are e 3 ≈ 20 times higher in individuals with infection compared with individuals without infection. b, LTLA-level prevalence estimates: raw Pillar 1+2 estimates (that is, positivity rate), cross-sectionally corrected Pillar 1+2 and gold-standard REACT estimates. For each of the nine PHE regions, we present the constituent LTLA whose name is ranked top alphabetically. The number of independent tests underlying each (orange) mean and credible interval based on the REACT data varied between 288 and 620. The number of independent tests underlying each (green or cyan) mean and credible interval based on the Pillar 1+2 data varied between 390 and 43,650. The green symbols and error bars show the mean exact binomial 95% confidence intervals. The cyan symbols and error bars show posterior median and 95% credible intervals. The orange symbols and error bars show the mean and 95% exact binomial confidence intervals.
of regions where transmission rates and/or prevalence are relatively high. To illustrate, we label five LTLAs with high prevalence and/ or R t estimates. The estimated longitudinal prevalence and R t for this subset of LTLAs (Fig. 4b,c) can help further characterize the longitudinal dynamics of prevalence and transmission in the time interval leading up to 20 June 2021. In particular, the data show the estimated rate of change in prevalence and separately indicate whether R t is increasing or decreasing. Figure 5a displays the spatiotemporal local prevalence and Fig. 5b displays R t , using a fortnightly sequence of maps, with each LTLA coloured according to its estimate prevalence or R t . Zoom-in boxes display the local fine-scale structure for London.

Relating local prevalence and transmission to spread of the variants of concern.
A striking feature of the maps in Fig. 5a is the increasing prevalence in London throughout November to December 2020. This is consistent with the known arrival of the Alpha variant of concern (VoC) 202012/01 (lineage B.1.1.7) that emerged in the South East of England in November 2020, and has been estimated to have a 43-90% higher reproduction number than pre-existing variants 28 . Similarly, the increase in R t from May 2021 onwards is in accordance with the spread of the Delta VoC 21APR-02 (lineage B.1.617.2), which is estimated to have a reproduction number approximately 60% higher than that of the Alpha VoC 29 .
Similar to a previous study 28 , we characterized the relationship between the estimated local R t and the frequency of Alpha VoC 202012/01, as approximated by the frequency of S gene target failure (SGTF) in Taqpath sequencing assays used during this time period 30 . Figure 6 illustrates the spatial distributions of the Alpha VoC 202012/01 against estimated prevalence and estimated R t from mid-November 2020 to mid-December 2020. The increase in frequency of the VoC was initially isolated to the South East but then spread outwards, accompanied by a corresponding increase in both local estimated prevalence and R t . We observe a strong positive association between the local VoC frequency and estimated local R t , which are consistent with the increased transmissibility of this VoC identified in ref. 28 .
We performed a similar analysis for the Delta VoC 21APR-02 using data provided by the Wellcome Sanger Institute's Covid-19 Genomics Initiative 31 . Extended Data Fig. 2 shows the spatial distributions of the Delta VoC 21APR-02 against estimated prevalence and estimated R t from the end of April 2021 to the start of June 2021. We see that the Delta VoC becomes the dominant variant over the course of this time period, and in contrast to the Alpha VoC, the spread of the variant was not isolated to a single region of England. We again observe a strong positive association between the local VoC frequency and estimated local R t . A simple linear regression of R t against Delta frequency for the week of 23 May 2021 indicated an increase in transmissibility of 0.55 (0.39-0.71) due to the Delta VoC, which is in accordance with estimates obtained in ref. 29 .
Accuracy validation using ultra-coarse and incomplete data to estimate δ. We assessed the performance of debiased fine-scale (LTLA-level) prevalence estimates by measuring how well they predict LTLA-level REACT data. The validation is best described in terms of coarse-scale REACT training data and contemporaneous fine-scale REACT test data. The training data inputted are REACT PHE-region-level and Pillar 1+2 LTLA-level positive (and number of) test counts for the week at the centre of the corresponding REACT round to be predicted. The test data are REACT LTLA-level positive (and number of) test counts aggregated across the relevant REACT sampling round. Figure 2 visually compares cross-sectional LTLA prevalence estimates from debiased targeted data (that is, based only on the training data) with accurate gold-standard estimates from REACT LTLA-level test data. The average estimated bias is reduced to low levels for comparisons with REACT round 7 (-0.08%, standard error (SE) = 0.02), round 8 (-0.07%, SE = 0.03) and round 9 (0.01%, SE = 0.02). Extended Data Fig. 3c,d displays analogous results for REACT rounds 10 and 11, with average estimated bias reduce to 0.03% (SE = 0.01) and 0% (SE = 0.01), respectively. REACT and ONS CIS are among the most comprehensive randomized surveillance studies in the world. We have tried to assess how well the debiasing model might hold when we are faced with coarser-scale or more limited randomized testing data. First, to investigate the downstream effects of ultra-coarse-scale randomized surveillance data, we aggregated all REACT data to the national level, estimated the δ curve at this ultra-coarse national level and then took this δ forward to estimate local prevalence. We found that estimates retained a high level of accuracy (Extended Data Fig. 4g-i). Second, to examine the effects of a more limited randomized surveillance regime, we left out REACT round 8, re-estimated δ curves at the PHE-region level and used these to infer local prevalence. In this case, we lost precision in our prevalence estimates for omitted  corresponds to an (LTLA, week) pair, and the results are for the period 18 October 2020 to 20 June 2021. Across each of the six scenarios presented, there is strong evidence of an association between R t and future change in case numbers (P < 2 × 10 −16 ). The strength of association between R t and 1-week-ahead case numbers has Spearman's ρ = 0.73 for the high baseline case group (>500 cases per 100,000), which decreased to ρ = 0.29 in the low baseline group (≤200 cases per 100,000). The association remained strong when predicting caseloads 2 weeks ahead, with, for example, ρ = 0.73 (Spearman's) for the high baseline case group.
Comparison of effective reproduction number estimates from the debiasing approach with estimates from other studies. We extracted estimates of R t based on our debiasing model likelihood implemented within a standard SIR model, illustrated in Extended Data Fig. 1. We compared the results to the local R t estimates outputted by at the Imperial College COVID-19 website 32 . A cross-method comparison of longitudinal traces of R t for a subset of LTLAs is shown in Extended Data Fig. 6. Encouragingly for both approaches, the estimates generally displayed good concordance, with credible intervals overlapping appropriately, despite being based on different data and models (Supplementary Table 1).

Discussion
The current standard practice internationally is to summarize SARS-CoV-2 infection rates by counting the number of individuals testing positive in a local area over a period of time, typically 1 week. The resulting statistic-cases per 100,000-is used to characterize and monitor the spatiotemporal state of an epidemic alongside other epidemiological measures such as R t . Problematically, however, interpreting cases per 100,000 is not straightforward, as the data are subject to a number of unknown biasing influences such as (1) variation in testing capacity, (2) ascertainment bias on who is (self)-selected to be tested and (3) imperfect sensitivity and specificity of antigen tests. These factors, among others, make it difficult to quantify the true underlying local incidence or prevalence of SARS-CoV-2 infection, which places a burden on policymakers implicitly to adjust for such biases themselves. To address this problem, we developed an integrative causal model that can be used to debias raw case numbers and accurately estimate the number of individuals with infection in a local area. The flexible statistical framework allows simultaneous and coherent incorporation of a number of important features. First, it corrects for ascertainment bias that result from preferential testing based on symptom status or on other confounders. This accounts for any variation in testing capacity by modelling the total number of tests conducted locally. Second, it can incorporate the use of different SARS-CoV-2 testing assays, such as LFD and PCR, including adjustment for particular sensitivity and specificity. Third, it infers the number of infectious individuals, while PCR tests may also pick up positive individuals at non-infectious stages. Finally, the model outputs week-specific debiased prevalence with uncertainty (via a marginal likelihood), which allows modular interoperability with other models. We illustrated this with a SIR epidemic model implementation that estimated local transmission rates while accounting for vaccine-and disease-induced immunity in the population. Our modelling work illustrates the benefits of having both a rolling randomized surveillance survey and targeted testing (for example, of frontline healthcare staff and symptomatic individuals). While targeted testing is routinely collected internationally, the United Kingdom has led the way in introducing regular national surveillance randomized surveys such as REACT 7,8 and ONS CIS 6 . Ongoing international pandemic preparedness can benefit from sampling designs that combine random sampling with targeted testing so that they can most powerfully complement and strengthen one another. Our model depends on the availability of randomized surveillance data. Future studies from other countries and collaborations with local experts will show and may further validate the breadth of utility of our debiasing framework and how it contributes towards global public health responses.
Since randomized surveillance data are currently rare internationally, there would be utility in extending the causal framework to address situations where targeted testing is accompanied by semi-randomized data with a well-known selection process (such as routine tests for healthcare workers, in care homes or regular testing at schools). Extending the current framework would begin with careful empirical exploration of the relationship between test positivity rate in such semi-randomized settings and comparable local prevalence (for example, in relevant age strata). The wealth of data available in the United Kingdom provides a good starting point for such exploratory work, which can be used to develop more complex causal models transferable to new semi-randomized contexts.

Methods
Ethics approval. The Alan Turing Institute Ethics Advisory Group provided guidelines for this study's procedures and advised that Health Research Authority approval is not required for this research.
Observational models for surveillance data. The primary target of inference is prevalence, I out of M, being the unknown number of infectious individuals at a particular time point in the local population of known size M. Our method estimates two types of prevalence: (1) the number of individuals that would test PCR positive (Ĩ) and (2) the number of individuals that are infectious (I). See below ("Focusing prevalence on the infectious subpopulation"), where we clarify the distinction between the PCR-positive and infectious subpopulations, and how we target the latter.
Temporal resolution of test count data. We applied the debiasing framework to test-count data aggregated into non-overlapping weeks. This has two clear advantages. First, by aggregating to weekly level data, we obviate the need to account for weekday effects that can be driven, for example, by logistical constraints or by individuals self-selecting to submit samples more readily on some weekdays than on others. Second, fitting a weekly model is computationally less intensive than fitting a model to daily test counts. The potential disadvantage of binning data by week is that high-frequency effects cannot be detected. Although it is possible in principle to adapt the framework to analyse daily testing data, we note that daily variation is likely to be confounded by weekday testing effects and so may be difficult to detect and interpret. Furthermore, while we use non-overlapping weekly data for model fitting, it is possible to output rolling weekly estimates, particularly to obtain as up-to-date prevalence estimates as are permitted by the data. However, we note that complete testing data are typically subject to a reporting lag of 4-5 days 33 .
Randomized surveillance data, u of U. Suppose that out of a total U randomized surveillance (for example, REACT and ONS CIS) tests, we observe u positive tests. The randomized testing (for example, REACT and ONS CIS) likelihood is and this allows direct, accurate statistical inference on Ĩ , the proportion of the population that would return a positive PCR test.
Focusing prevalence on the infectious subpopulation. PCR tests are sensitive and can detect the presence of SARS-CoV-2 both days before and weeks after an individual is infectious. It is usually desirable for prevalence to represent the proportion of a population that is infectious. We can obtain a likelihood for the number of infectious individuals I as follows: where I and Ĩ are the number of infectious and PCR-positive individuals, respectively. The conditional distribution P(Ĩ | I) can be specified on the basis of external knowledge of the average length of time spent PCR-positive versus infectious. Our approach to estimating this quantity imports information on the timing of COVID-19 transmission 34 and the interval of PCR positivity in individuals with SARS-CoV-2 infection 35 . More precisely, we specified the infectious time interval for an average individual with infection in the population to span the interval 1-11 days after infection (the empirical range of generation time from fig. 1A of ref. 34 ). We then calculated the posterior probability of a positive PCR occurring 1-11 days after infection ( fig. 1A of ref. 35 ). We incorporated the effects of changing incidence in the calculations; this is important because, for example, if incidence is rising steeply, the majority of people who would test PCR positive in the population are those that are relatively recently infected. Full details can be found in Supplementary Information "PCR positive to infectious mapping-method details".
Targeted surveillance data, n of N. In contrast to the randomized surveillance likelihood in equation (2), the targeted likelihood can be expressed in terms of the observation of n of N positive targeted (for example Pillar 1+2) tests as follows: where P( tested | infected ) and P(tested | not infected) are the probabilities of an individual with infection (respectively, individual without infection) being tested.
Bias parameters, δ and ν. We introduce the following parameters: leading to the targeted swab testing likelihood being represented as The unknown parameter that requires special care to infer is δ, that is, the log odds-ratio of being tested in the infected subpopulation versus in the non-infected subpopulation. The other parameter, ν, is directly estimable from the targeted data: ν := logit [(N − n)/M] is a precise estimator with little bias when prevalence is low.
Test sensitivity and specificity. The likelihood in equation (7) assumes a perfect antigen test. If the test procedure has false-positive rate α, and false-negative rate β, the targeted likelihood is instead where z denotes the unknown number of individuals who truly have an infection that were tested. The first term in the sum in equation (8) is obtained by substituting z in equation (7), while the second term is with n β denoting the number of false-negative test results. An analogous adjustment can be made to the randomized surveillance likelihood in equation (2).
Cross-sectional inference on local prevalence. We leveraged spatially coarse-scale randomized surveillance data to specify an EB prior on bias parameters p(δ) at coarse-scale (PHE region), and thereby accurately infer prevalence from targeted data at fine scale (LTLA j within PHE region J j ). We explicitly use the superscripts LTLA (j) in PHE region (J j ) in step 4 below, where notation from both coarse and fine scale appear together. All quantities in steps 1-3 are implicitly superscripted (J j ), but these are suppressed for notational clarity. For computational efficiency, we handle prevalence in a reduced-dimension space of bins as described in Supplementary Information section "Interval-based prevalence inference-set-up and assumptions". The method in detail is as follows: 1. Infer prevalence from unbiased testing data. At a coarse geographic level (PHE region J j ), estimate prevalence from randomized surveillance data u t of U t . Represent the posterior at time t in mass function where a moment-matched Gaussian approximation is performed in equation (12) (we assessed the reasonableness of this approximation using diagnostic plots (Supplementary Fig. 2)). The posterior density in the sum in equation (11), p(δt | nt of Nt, It,νt) is conjugate under a Beta(a,b) prior on logit −1 (νt + δt) ≡ P( tested | infected ), and so can be evaluated as follows (where BetaCDF is the cumulative distribution function of the beta distribution): 3. Specify smooth EB prior on δ 1:T . A smooth prior on δ 1:T is specified as follows: where N(δ | 0, Σ δ ) imparts a user-specified degree of longitudinal smoothness, thereby sharing information on δ across time points. Ignorance of δ t , in the absence of random surveillance data, is encapsulated in a Gaussian with large variance σ 2 flat . A standard choice for N(δ | 0, Σ δ ) corresponds to a stationary autoregressive, AR(1), process of the form with a diffuse Gaussian prior c ∼ N(0, σ 2 flat ) and with smoothing tuned by 0 < ψ < 1 and white noise variance σ 2 ε . The normalized form of the prior in equation (14) is with (μ, diagonal matrix D T×T ) having elements (μ t ,σ 2 t ) for t ∈ T and (0, σ 2 flat ) for t / ∈ T .
Debiasing LFD tests with PCR surveillance (or vice versa). The methods can be adapted in a straightforward manner to the situation in which the randomized surveillance study uses a different assay to the targeted testing. For a concrete example, we could use REACT PCR prevalence posterior p t (Ĩt) from equation (10) to debias Pillar 1+2 LFD test data n t of N t . Equation (11) can be adjusted to estimate the ascertainment bias δ pertaining to LFD data as follows: where Ī t and Ĩ t are the unobserved LFD-and PCR-positive prevalence, respectively, and the conditional distribution P(Īt |Ĩt) can be estimated on the basis of external knowledge of the average length of time spent PCR-positive versus LFD-positive, analogously to as described in above in "Focusing prevalence on the infectious subpopulation". The remaining computations, from equation (12) onwards, are unchanged, with the outputted fine-scale marginal likelihood p(n Full Bayesian inference under a stochastic SIR epidemic model. The cross-sectional analysis described above in "Cross-sectional inference on local prevalence" generates the δ-marginalized likelihood, p(n (j) t of N (j) t | I (j) t ,νt) in equation (17), at each time point for which targeted data are available. These likelihoods can be used as input for longitudinal models to obtain better prevalence estimates and to infer epidemiological parameters such as R t .
We illustrate this via a Bayesian implementation of a stochastic epidemic model whereby individuals become immune through population vaccination and/or exposure to COVID-19 ( Supplementary Fig. 1). We incorporate known population vaccination counts into a standard discrete time Markov chain SIR model (ref. 36 , chapter 3). Details of the transition probability calculations are given in the Supplementary Information sections "SIR model details" and "SIR modeldiscussion, assumptions and caveats".
Priors on R, I and R + . We place priors on I, R + measured as a proportion of the population; this proportion then gets mapped to prevalence intervals on subpopulation counts as described in "Interval-based prevalence inferenceset-up and assumptions" in the Supplementary Information. Specifically, we use truncated, discretized Gaussian distributions on the proportion of the population who are immune and infectious. For example, on the number of infectious individuals I t at each time point t, we specify the prior (suitably normalized over its support) with an example weakly informative hyperparameter setting being μ I = 0.5%, σI = 1%, pmin = 0%, pmax = 4%. To ensure meaningful inference on R + 1:T , we placed an informative prior that reflects the state of knowledge of the immune population size. We did this using an informative truncated Gaussian prior on R + 1 and noninformative priors on R + 2:T . We placed a noninformative uniform prior on each R t , for example a Uniform(0.5, 2.5).
Markov chain Monte Carlo sampling implementation. We performed inference under the model represented in the DAG in Supplementary Fig. 1. The likelihood is marginalized with respect to δ, and we used Markov chain Monte Carlo to draw samples from the posterior p(I, R + , R | n, N) .
We sampled R and (I, R + ) using separate Gibbs updates. For sampling (I, R + ), we represented the joint full conditional as p(I, R + |R, n, N) = p(I | R, n, N)p(R + |I) , sampling I new from p(I | R, n, N), and then R + new from p(R + | I new ).
Sampling from p(I | , n, N). The sampling distribution on prevalence can be expressed as which is a hidden Markov model with emission probabilities taken from the δ-marginalized likelihood in equation (18), and transition probabilities taken from equation (37)

(Supplementary Information).
Sampling from p( | I). The prior joint distribution of R 1:T was modelled using a random walk as follows: where σ 2 R is a user-specified smoothness parameter. The update involves sampling from We discretized the space of R t into an evenly spaced grid and sample from the hidden Markov model defined in equation (24) 37 . The transition probabilities are given by equation (23) (suitably normalized over the discrete R t space) and the emission probabilities given by equation (37)

Code availability
The R scripts 40 used to generate the results in this manuscript are available in the following Git repository: https://github.com/alan-turing-institute/ jbc-turing-rss-testdebiasing.
1 nature portfolio | reporting summary

March 2021
Corresponding author(s): Prof. Chris Holmes Last updated by author(s): Nov 11, 2021 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, All studies must disclose on these points even when the disclosure is negative.

Study description
Methods for studying COVID-19 epidemiology by analysing community testing data (quantitative count data, comprising the number positive and total number tested)

Research sample
We collectively analyse two types of community testing data: The REal-time Assessment of Community Transmission (REACT) study is a nationally representative prevalence survey of SARS-CoV-2 based on repeated cross-sectional samples from a representative subpopulation defined via (stratified) random sampling from England's National Health Service patient register. Pillar 1 and Pillar 2 PCR test data form the main part of the UK government's national antigen testing strategy. Pillar 1 tests refer to "all swab tests performed in Public Health England (PHE) labs and National Health Service (NHS) hospitals for those with a clinical need, and health and care workers", and Pillar 2 comprises "swab testing for the wider population". The Pillar 1+2 research sample is dynamic, and is not representative of the population as a whole. In particular, Pillar 1+2 is enriched for NHS workers, individuals with symptoms, and those who self-select the testing, all of which can affect the demographic representation in the targeted Pillar 1+2 individuals. The REACT research sample is however designed to be nationally representative by stratified random sampling. We are able to use these REACT data to account for the ascertainment bias implicit in Pillar 1+2 data.
For Pillar 1+2, the age and sex demographic breakdowns for the number of tests conducted are available here: https:// assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1007158/ Demographic_LA_tables_Week_60.ods As an example, for the week commencing 2021-01-07, the age breakdown was as follows: The rationale for choosing Pillar 1+2 positive and total counts is that these data are the most highly publicised case counts in England and they are regularly made publicly available, which allows ready reproducibility and ongoing implementation of our methodology. The rationale for choosing the REACT study is that we require randomised surveillance data in order to correct for ascertainment bias. REACT is one of the two major randomised surveillance studies, along with the office for National statistics COVID-19 infection survey, ONS CIS. We specifically use REACT data here because they are regularly made publicly available in raw form (positive and total counts) at the PHE region level, which is compatible with our downstream statistical modelling, and which allows transparency and reproducibility of our findings.

Sampling strategy
For the REACT data, participants were included in the tested group through stratified random sampling. For the Pillar 1+2 data, however, there is strong ascertainment bias, since infected individuals are more likely to be chosen for testing (e.g. frontline workers, symptomatics). In our paper we correct for this ascertainment bias in the Pillar 1+2 data by designing a causal model and thereby adjusting for the bias to obtain accurate estimates of local prevalence. We have used two datasets in our study: Pillar 1+2 and REACT. These have the advantage of being both publicly available (allowing reproducibility of our results), and we demonstrate in the paper that they are sufficient for us to develop, illustrate and, importantly, to validate our methodology (See section "Accuracy validation using ultra-coarse and incomplete data to estimate delta")

Data collection
In the REACT study, participants self-gathered throat and nose swab samples; no researcher was typically present during sample collection and other members of the public could have been present; participants were not blinded to the study hypothesis. Samples were then sent by post to a pre-specified laboratory for processing.
For the Pillar 1+2 data, throat and nose swabs were gathered in various ways. For home testing, participants self-gathered throat and nose swab samples; no researcher was typically present during sample collection and other members of the public could have been present; participants were not blinded to the study hypothesis. In the case of regional or local test sites, or mobile testing units, swabs were gathered by a trained healthcare professional while the participant was seated in a motor vehicle; usually only the researcher and participant were present; neither the researcher nor the participant were blinded to the study hypothesis.

Timing
Between 31st May 2020 and 20th June 2021

Data exclusions
No data were excluded from the analysis