Characterising within-hospital SARS-CoV-2 transmission events using epidemiological and viral genomic data across two pandemic waves

Hospital outbreaks of COVID19 result in considerable mortality and disruption to healthcare services and yet little is known about transmission within this setting. We characterise within hospital transmission by combining viral genomic and epidemiological data using Bayesian modelling amongst 2181 patients and healthcare workers from a large UK NHS Trust. Transmission events were compared between Wave 1 (1st March to 25th July 2020) and Wave 2 (30th November 2020 to 24th January 2021). We show that staff-to-staff transmissions reduced from 31.6% to 12.9% of all infections. Patient-to-patient transmissions increased from 27.1% to 52.1%. 40%-50% of hospital-onset patient cases resulted in onward transmission compared to 4% of community-acquired cases. Control measures introduced during the pandemic likely reduced transmissions between healthcare workers but were insufficient to prevent increasing numbers of patient-to-patient transmissions. As hospital-acquired cases drive most onward transmission, earlier identification of nosocomial cases will be required to break hospital transmission chains. SARS-CoV-2 has resulted in multiple outbreaks in hospitals, but identifying transmission events is challenging. Here, the authors combine whole genome sequencing and epidemiological data from the first two waves of the pandemic at a UK hospital trust and characterise transmission chains.

S evere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in multiple hospital outbreaks, exposing healthcare workers (HCWs) and non-COVID-19 patients to SARS-CoV-2 infection [1][2][3] . At least 32,307 patients are thought to have been infected in hospitals in England and Wales and an estimated 414 HCWs died between March and December 2020 4 . This is probably a substantial underestimate as it represents the number of individuals that meet the narrow definitions for healthcare-associated infection rather than the true number of infected individuals in hospitals. To safely continue routine and elective activities in hospitals during times of high SARS-CoV-2 incidence, it is important to discern factors that drive hospital-acquired infections. This greater understanding can be used to protect staff and patients, as well as informing further efforts to contain hospital outbreaks.
Identifying within-hospital SARS-CoV-2 transmission events using epidemiological data remains challenging for two reasons. Firstly, the high variability in the viral incubation period means it is often difficult to determine whether hospital onset cases are community-or hospital-acquired. Secondly, at least 33% of SARS-CoV-2 infections in adults are thought to be asymptomatic 5 , therefore identifying all patients and HCWs contributing to transmission is challenging. In the limited instances where viral genomic data were analysed, this information was used to confirm or complement a purely epidemiological approach [6][7][8][9] . Elucidating the source of transmission events on the basis of viral genetic relatedness alone also entails considerable uncertainty due to the slow evolutionary rate of SARS-CoV-2 10 . In the time scale of an outbreak, a large proportion of individuals are infected by viruses too genetically similar to each other to distinguish genuine transmission events from unrelated infections. Furthermore, data from HCWs have rarely been included in previous analyses 11 and the relative role that patients and HCWs have played in fuelling hospital outbreaks in the UK remains largely unknown.
The integration of genomic, epidemiological and location data into a statistical inference framework offers a possible route to more accurate estimates of within-hospital transmission. Under such an approach, a transmission event between a pair of individuals is supported if their symptom onset times are compatible with the serial interval distribution SARS-CoV-2, if the individuals are in the same hospital location at the time of a suspected transmission event and if their viral genomes exhibit a high degree of relatedness.
In this study, we reconstructed SARS-CoV-2 outbreaks in a large NHS teaching hospital trust in England during the first two UK epidemic waves. We integrated over 2000 viral genomic sequences, patient and staff locations, and routinely available epidemiological information in a Bayesian framework that incorporates prior knowledge on the relative contributions of within-ward and between-ward transmission, as well as the proportion of cases involved in transmission chains that were not represented in the dataset 12,13 . Using this approach, we characterised the dynamics of SARS-CoV-2 transmission within a hospital setting, identifying key differences across the two pandemic waves, as well as the relative contribution of different groups and hospital locations to within-hospital transmission.

Results
Study population, SARS-CoV-2 testing and infection control measures. During the first wave of the UK epidemic (Wave 1; defined as 1st March to 25th July 2020 for our analysis), 886/ 1,184 (74.8%) patients and 842/1,104 (76.3%) HCWs at STHNFT who tested positive for SARS-CoV-2 had sequence data available with over 90% genome coverage (Fig. 1). During the second wave (Wave 2; defined here as 30th November 2020 to 24th January 2021) 669/1183 (56.6%) SARS-CoV-2 positive patients and 651/ 838 (77.7%) SARS-CoV-2 positive HCWs had sequence data available with over 90% genome coverage. Cases were excluded if they were outpatients, non-clinical staff (not working in clinical areas; e.g. Accounting, Information Technology, Catering), non-STHNFT or community-based staff, household contacts of staff members, and staff who had missing ward location data, leaving 1302 individuals in Wave 1 and 879 individuals in Wave 2 for the analysis (Fig. 1b, Table 1).
SARS-CoV-2 testing policy and infection prevention and control (IPC) measures evolved throughout the pandemic (Fig. 1a). Testing was performed in symptomatic patients with suspected SARS-CoV-2 infection on admission throughout the study period, with testing offered to symptomatic staff from 17th March 2020 14 . Testing of all admissions regardless of symptoms commenced on 25th April 2020 and screening of all asymptomatic patients and staff on wards with outbreaks from 18th May 2020. In addition to screening on admission, all patients were routinely tested on day 5 of admission from 1st September 2020. Routine twice weekly testing using lateral flow devices, followed by confirmatory Nucleic Acid Amplification Test (NAAT), was offered to staff in all clinical areas from 8th December 2020. Level 2 personal protective equipment (PPE; aprons, gloves, eye protection and fluid resistant surgical face mask) was used by staff only for seeing suspected COVID-19 cases from 17th March 2020, and for all patient contact from 8th April 2020. HCWs were mandated to wear surgical face masks in all areas of the hospital from 15th June 2020. The SARS-CoV-2 staff vaccination programme commenced on 10th December 2020.  Fig 1).
Quantifying hospital-acquired infections. Using admission dates, symptom onset dates, SARS-CoV-2 positive test dates, ward location of cases and comparison between consensus viral genome sequences, our model inferred likely hospital transmissions between individuals, together with (i) the time of those events, (ii) the ward on which the transmission occurred, and (iii) whether the infector was a sampled case or a case involved in transmission but not represented in the dataset.
From the 1302 cases in our dataset from wave 1, our model identified 388 (95% credible interval (CI) 292-479) hospitalacquired infections, along with 85 (95% CI  further cases estimated to be involved in transmission but not represented in the dataset (total 473 hospital-acquired infections, 95% CI 310-688). During Wave 2, 350 (95% CI 285-410) cases from the 879 in the dataset were estimated to be hospital-acquired infections, with another 52 (95% CI 11-120) infections estimated that were not represented in the dataset (total 402, 95% CI 293-538). In Wave 1, patient cases comprised 40.9% (95% CI 36.1-45.5%) of all sampled hospital-acquired cases, compared to 65.1% (95% CI 60.3-69.4%) of all sampled hospital-acquired cases in Wave 2. Our model estimates showed good agreement with previous a priori inpatient epidemiological definitions of community or hospital onset categories (Table 1 and  Supplementary Table 2) 15 . Specifically, no 'community onsetcommunity associated' patient cases were identified as hospitalacquired by our model during either wave, and the majority of 'hospital onset-hospital acquired' and 'hospital onset-suspected   Ward and Bay level transmission. Identified transmission events were not evenly distributed across the 132 hospital locations included but isolated to 38 wards in three of the five hospitals within STHNFT. The eight wards with the highest number of infections in Wave 1 accounted for 51.0% (95% CI 39.7-63.4%) of all transmissions, indicating the presence of transmission hot spots. A similar finding was observed during Wave 2, where 10 wards accounted for 50.1% (95% CI 40.6-60.5%) of all transmission events (Fig. 3). We found evidence that the relative importance of specific wards in contributing to overall transmission was maintained across the two waves (Spearman's Rank correlation Rho 0.54, P < 0.0001, Ranked by mean number of transmissions per ward). However, there was considerable variability between waves, and several wards that were transmission hotspots in Wave 1 did not make up the 10 wards accounting for >50% of transmissions in Wave 2 ( Fig. 3a-b). Equally, several wards with no transmission event during Wave 1 were identified as transmission hotspots in Wave 2. Very few transmissions were estimated to have occurred in critical care units (one in Wave 1 and none in Wave 2). Considerable variability was also seen between wards in the proportion of infectors and infectees made up by patients and staff ( Fig. 3c-f). The highest number of infections on a single ward was 50 (95% CI 40-58) in Wave 1 but decreased to 30 (95% CI 23-36) in Wave 2. The highest number of separate transmission chains on a single ward was 8 (95% CI 5-11) for Wave 1 and 5 (95% CI 3-7) for Wave 2.
Wards comprise a combination of multi-bed bays with shared bathroom facilities and individual en-suite side rooms. We used a post hoc analysis to evaluate the contribution of bay-level transmission between patients to the outbreak. We identified 38.3% (95% CI 29.9-47.1%) of patient-patient transmissions in Wave 1 and 33.8% (95% CI 27.9-39.6%) in Wave 2 were between patients who shared a bay at some point during their stay. We estimated an increased risk of transmission between individuals who shared a bay compared with those who shared a ward as 2.8 (95% CI 2.2-3.5) times higher in Wave 1 and a 2.5 (95% CI 2.1-2.9) times higher in Wave 2.
Secondary cases. The crude mean number of secondary cases was 0.30 (95% CI 0.21-0.38) for Wave 1 and 0.40 (95% CI 0.31-0.48) for Wave 2. Adjusting for cases involved in transmission but not represented in the dataset, we estimated that 51.3% (95% CI 49.6-53.1%) of infections in Wave 1 and 43.6% (95% 42.3-45.0%) of infections in Wave 2 resulted in no onward transmission. Only 0.2% (95% CI 0.04-0.4%) infections in Wave 1 and 0.6% (95% CI 0.4-0.9%) infections in Wave 2 resulted in more than 5 secondary cases (Fig. 4). Fewer patients classified as having community onset-community associated infections gave rise to secondary cases within the hospital (3.7%, 95% CI 1.8-5.6% for Wave 1;  Table 2). All findings were consistent across sensitivity analyses in which we either relaxed the probability of the most recent sampled ancestor α i and infectee i being registered on the same ward on the day of transmission or changed π, the proportion of all cases in the outbreak represented in the dataset (ranging from 30 to 70%) ( Supplementary Figs. 3-4).

Discussion
To our knowledge, our findings represent the largest collection of SARS-CoV-2 genomic and hospital epidemiology data to date used to reconstruct directional transmission networks, where we estimated hospital-acquired SARS-CoV-2 infections across two pandemic waves in the UK using a Bayesian framework. Importantly, our model also accounts for events within the identified transmission networks that were not represented in the dataset, which is crucial given the likely presence of unidentified infections or those lacking sequence data. We observed different contributions to the total number of within-hospital transmission events from those occurring between and within staff and patients across the two waves. We identified transmission hotspots within our institution, with a relatively small proportion of locations accounting for most hospital-acquired infections in staff and patients. We also found that the majority of SARS-CoV-2 infections resulted in onward transmission, with secondary cases identified in >50% of infections but relatively few so called 'superspreader' events. While much attention has been paid to staff potentially acquiring SARS-CoV-2 infections from patients due to perceived or real deficiencies in PPE, our findings suggest that the majority of HCW infections during the first pandemic wave were acquired from other HCWs. This finding is supported by similar results from a prior simulation study 16 . The contribution of these staff-to-staff infections to hospital-acquired transmission reduced dramatically during the autumn 2020 wave of SARS-CoV-2. Staff were less likely to initiate hospital transmission chains in Wave 2, accounting for 31.3% of index cases compared to 50.6% in Wave 1. Infection control practice and understanding of SARS-CoV-2 transmission evolved considerably during the pandemic. Improved social distancing and wearing of face coverings in non-clinical areas may explain some of these observations. In addition, the importance of asymptomatic transmission was increasingly appreciated and twice weekly lateral flow testing for healthcare workers was introduced in December 2020. Furthermore, seroprevalence rates of over 25% have been reported in HCWs following the first pandemic wave 17 , including in our NHS Trust 18 , which may have contributed to greater protection and reduced transmission in some areas. For example, we have reported that the SARS-CoV-2 seroprevalence in staff working on our acute medical unit by June 2020 was over 40% 18 . This was an area that was a hotspot for transmissions involving staff during our Wave 1 analysis but had very few transmission events identified during Wave 2. Patient to staff transmissions remained constant both in terms of absolute and proportion of transmission events across the two waves, suggesting that whatever factors are responsible for the reduction in staff-staff infections had limited impact on the risk of patient-to-staff transmissions. By Wave 2, most staff infections in our NHS trust were estimated to have been acquired from patients, so further efforts are required to increase protection for HCWs. Staff vaccination is anticipated to have a large impact but is unlikely to have played a significant role in our observations due to the introduction towards the end of Wave 2. Hospital-acquired infections during Wave 2 were overwhelmingly dominated by patient-to-patient transmissions. The reasons for these events are likely to be multifactorial. UK hospitals faced significant bed pressures during this period and unlike during Wave 1, attempts were made to maintain as many routine and elective procedures for as long as possible. By this point, all patients in our hospitals were being routinely tested by NAAT for SARS-CoV-2 on admission and on day 5. Accordingly, the percentage of patients included in our dataset with asymptomatic infection increased from 10.4% during Wave 1 to 23.9% in Wave 2. The intense increase in patient-to-patient infections unfortunately occurred despite this enhanced focus on preventing asymptomatic transmission. Most of our transmission hotspots were wards built over two decades ago, with 6-8 beds per bay, and shared toilet facilities between every 1 to 2 bays 19 . While ventilation in these settings is in line with applicable regulations at the time of construction, none were designed with a respiratory pandemic in mind. Any contribution from these fixed estate issues will be challenging to address in a short timeframe. While viruses with greater transmissibility could also have played a role during Wave 2, circulation of the B.1.1.7/alpha variant occurred relatively late in our region compared to many other parts of the UK, and many Wave 2 transmission events were due to other SARS-CoV-2 lineages. This is in keeping with a recent study demonstrating that B.1.1.7/alpha infections did not result in greater hospital-acquired infections in the UK 20 .
Ward design differs throughout the hospital trust with a varying number of side rooms and bed bays. The number of beds in each shared patient bay ranges from 4 to 8. Of the top 10 wards with the most number of transmission pairs, 7 were wards with 6 or more beds per bay. High attack rates have previously been reported between patients in shared occupancy spaces and factors   Fig. 4 Secondary cases distributions and onward infections by Hospital Onset COVID19 Infection category. a the percentage of cases with each number of secondary cases after adjusting for cases involved in transmissions but not represented in the dataset. b The percentage of each HOCI category case with at least 1 onward infection. Error bars represent 95% credible intervals. COCA Community onset community associated; positive test up to 14 days before or within 2 days after hospital admission. COSHA Community onset suspected hospital associated; positive test up to 14 days before or within 2 days after admission, with discharge from hospital within 14 days before test. HOIHA Hospital onset intermediate hospital associated; positive test 3-7 days after hospital admission, (*) with no discharge from hospital in 14 days before the specimen date. HOSHA Hospital onset suspected hospital associated; positive test 8-14 days after admission or 3-14 days after admission with discharge from hospital in 14 days before test. HOHA Hospital onset hospital associated; positive test 15 or more days after hospital admission. Classification of patient cases according to likely source of infection (community or hospital-acquired) is based on SAGE criteria 15 . Number of cases per HOCI category is shown in Table 1.
such as bay size are likely to explain why some hospital wards experienced a greater number of hospital-acquired cases compared with others 21 . Of note, very few transmissions were estimated to have occurred on critical care units, which may have a number of explanations including universal use of enhanced PPE.
We found that the distribution of secondary cases was very similar across both waves, with~50% of SARS-CoV-2 cases resulting in onward transmission, although only 5-10% of all infections resulted in more than two secondary cases, matching findings from another UK based study 22 . Our findings are different from those in a smaller study focusing on a few large clusters in another UK hospital, where 20% of individuals caused 80% of transmission events 11 . Although there is no clear threshold for the number of cases of a superspreading event, we did not find many examples where a high number of cases were associated with a single case. On average, the maximum number of individuals linked to the same index was six across all networks. Our findings do not support superspreading events forming a significant proportion of all hospital-acquired infections. This may in part be due to our focus on the entire hospital environment rather than on specific epidemiologically identified outbreaks.
Importantly, we find that hospital-acquired SARS-CoV-2 cases give rise to a greater number of secondary cases than communityonset community-associated cases supporting previous findings 23 . Cases admitted from the community already suspected as having COVID-19 will have been isolated in single cubicles or COVID-19 cohort areas more rapidly, thus limiting opportunities for onward transmission. As severe disease requiring hospitalisation often occurs later in infection, they may also be at a less infectious stage, although hospitalised cases may also shed viable virus for longer 24 . In contrast, individuals with no SARS-CoV-2 symptoms who later acquired nosocomial infection may have initially be placed in bays with other susceptible patients, all of whom tested SARS-CoV-2 negative on screening tests at admission. Given the high viral loads during the first few days of infection, including during pre-symptomatic stages 25 , our data suggest that these individuals may acquire SARS-CoV-2 in hospital and have ample opportunity for onward transmission before being detected and isolated. This finding indicates that asymptomatic testing of patients on admission and day 5 was insufficient to prevent these scenarios. Daily testing of patients in the first week of admission or more regular testing throughout admission may allow greater opportunity for intervention, as well as more recent recommendations to IPC guidance such as routine wearing of masks by all patients in bays. Equally, rapid point-ofcare testing (POCT) on admission may also reduce the window for transmission early in admission as it allows earlier isolation of asymptomatic community-acquired cases. Our Trust instituted POCT for all medical admissions in mid-January 2021.
Our study has several limitations that are important to consider. Firstly, despite the large number of individuals included, this is a single centre study and may not be generalisable across all UK hospitals given the heterogeneity in practice, building infrastructure, and patient population that exists. Our organisation had a high number of documented hospital-acquired infections in patients between March 2020 and March 2021 (n = 795), but was not an outlier with 7 other NHS Trusts with higher numbers (highest n = 1463) 26 . Seven of the top 10 busiest NHS Trusts (including our own) were also in the top 10 Trusts with the highest number of hospital-acquired COVID-19 infections in patients, indicating a common theme that may be a driver of nosocomial SARS-CoV-2 infections 27 . The effectiveness of various infection control measures on within-hospital transmissions over time in our setting is likely to be generalisable to many UK hospitals, as they were based on national guidance applicable to all NHS Trusts.
Although we did not have a selective sampling strategy, either for case detection or sequencing of positive cases, it is possible that there was an unobserved sampling bias. For example, as individuals with higher viral loads will be more infectious and their samples more likely to result in successful sequencing, they are more likely to have been included in our dataset. As IPC practice evolved during the course of the pandemic, testing of all asymptomatic patients and staff in outbreak wards only commenced towards the latter part of the first wave, but was routine practice along with other new measures during wave 2. This could have had an impact on some comparisons like the size of outbreaks across waves or if more systematic sampling in this way increased detection proportionately in one group (e.g. patients) than another (e.g. staff). Our model attempted to account for cases that were not represented in the dataset but would not mitigate any bias entirely. Some of our key conclusions, such as the greater onward transmission from hospital-acquired cases was also consistently seen across both waves. It is important to consider that staffto-staff transmission in non-clinical areas (both inside and outside of the clinical setting) can also be an important driver of HCW infections due to social and behavioural factors which are difficult to adequately quantify in models. While we had electronic records of precise location data for patients during all times of their admission, staff location data were less granular and dependent on self-reported areas of work in the 14 days prior to infection. We undertook sensitivity analyses to test several assumptions regarding priors in our model, as outlined in Supplementary Figs. 2-3 but some assumptions remain unexplored. For example, we assumed that the probability of inclusion in our dataset was the same for both staff and patients. Further granularity could be considered in future developments of the outbreaker model.
With this study, we provide evidence that the integration of clinical surveillance data, viral genomic information and modelling enhances our capacity to unravel the complex transmission dynamics of SARS-CoV-2 in times and places of high incidence. The application of such a high-resolution framework to healthcare settings offers attractive perspectives for guiding the development of a safe environment for both staff and patients, as it may have a significant impact on the reduction of SARS-CoV-2 hospital transmission in subsequent epidemic waves.

Methods
Study population. All cases in the study were patients or staff who tested positive for SARS-CoV-2 at Sheffield Teaching Hospitals NHS Foundation Trust (STHNFT), Sheffield, UK, between 1st March 2020 and 24th January 2021. STHNFT is a large UK NHS hospital Trust which includes five hospitals, has an average bed occupancy of 1400, and employs~17,000 staff. SARS-CoV-2 nucleic acid amplification tests (NAAT) were performed on nose and/or throat swabs throughout the pandemic in line with contemporaneous UK Department of Health and Social Care guidance 28 , using Hologic Panther or an in-house dual E/RdRp gene real-time PCR assay 29,30 .
Patients were included in the analysis if they tested positive for SARS-CoV-2 at or during admission. Staff were included if they tested positive for SARS-CoV-2 and had worked in a clinical area in the 14 days prior to a positive test. Information on symptom onset of patients and their ward movements, together with place of work for staff, were extracted from STHNFT electronic records, when available.
Sample Preparation, ARTIC Network PCR and Nanopore Sequencing. Sequencing was attempted on all available residual samples collected for routine diagnostic testing from STHNFT throughout the study period, with fluctuation in the proportion of positive samples sequenced due to multiple factors, including laboratory capacity and availability of stored samples. There was no systematic strategy to sequence samples from suspected outbreak wards alone. The first positive sample from each individual was selected for sequencing. RNA was extracted from viral transport medium and subject to the ARTIC network tiled amplicon protocol 31 , followed by sequencing on an Oxford Nanopore GridION X5. Base calling was performed using a high accuracy model and the default basecaller in MinKNOW (currently guppy v4). Reads were filtered based on quality and length (400 bp to 700 bp) and mapped to the Wuhan reference genome (GenBank accession number NC_045512). Reads were downsampled to 200x coverage in each direction and variants called using nanopolish 32 to determine changes from the reference, followed by consensus sequence generation. Samples with over 90% genome coverage were included for further analysis. Viral genomic sequences were classified into PANGO lineages using the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) 33 version 2.4.2 and a multiple sequence alignment built using MAFFT 34 with 10 iterative refinements. All alignment positions flagged as problematic for phylogenetic inference were removed, including highly homoplasic positions and 3′ and 5′ ends 35 .
Hospital outbreak reconstruction model. We used Outbreaker2, a modular discrete-time stochastic model for reconstructing likely transmission trees of an outbreak based on pathogen genetic sequences and their collection dates in a bayesian framework via MCMC 12,13 . To investigate nosocomial outbreaks, we extended the most recent implementation of the Outbreaker2 model to capture ward-level transmission by incorporating ward occupancy data and probabilistically favouring infections that occurred within a ward rather than between individuals on different wards 36 . Our Bayesian model calculates the likelihood of a transmission event from case i to case j at a putative transmission time, given the time of symptom onset for case i and j, the Hamming distance between the corresponding virus genetic sequences, and the ward that i and j were on at the time of infection. The model also infers unobserved infections and unobserved transmission pathways using a constant reporting rate parameter (an ascertainment probability, i.e., the proportion of all SARS-CoV-2 positive cases in admitted patients and hospital staff that were captured in our dataset). This parameterization (outlined in Supplementary Methods) allows to infer unobserved transmission pathways linking a given ward to another given ward over consecutive generations of infection.
We estimated the ascertainment probability as the product of (i) the proportion of all cases that were likely detected via testing, (ii) the proportion of detected cases with high-quality sequence, and (ii) the proportion of these cases where the ward location was known. We used a point estimate of 0.5 for this ascertainment probability but varied this estimate in a one-way sensitivity analysis (Estimates of 0.3, 0.4, 0.6 and 0.7; Full details in Supplementary Methods).
We also extended this model to estimate the number and identity of imported community-acquired infections. To do so, we first estimated the fraction of community-acquired infections across the whole dataset using admission dates, symptom onset dates and the incubation period. We then classified cases as either an imported infection or a hospital-associated infection based on the likelihood of a given case being infected by another individual observed in the dataset. The generation time distribution (the delay between the infection of a primary case and the infection of a secondary case), and the incubation period distribution (the delay between the infection and symptom onset) used to inform the inference were based on previously published estimates which incorporate uncertainty around these values 37,38 .
In our base-case analysis, we used a global sensitivity method, incorporating all results from a sensitivity analysis in the final results output, to capture uncertainty in (i) the proportion of cases that were community-acquired infections imported into the hospital (Wave 1: N(μ = 0.7, σ = 0.075), Wave 2: N(μ = 0.6, σ = 0.06)), (ii) the symptom onset date for individuals for whom this information was unavailable and (imputed 100 times), and (iii) the place of work for some staff who had multiple work locations (imputed 100 times). A final posterior distribution of 10,000 transmission networks was inferred by integrating over the uncertainty across imputed datasets of these three aspects. Full details of the model, including model fitting and prior distributions, are provided in Supplementary Table 1. All analysis was carried out in R (Version 4.0.3) 39 .
Disclaimer. The views expressed are those of the author(s) and not necessarily those of the NIHR, Public Health England or the Department of Health and Social Care.

Data availability
Viral genomes were mapped to the publicly available Wuhan reference genome (GenBank accession number NC_045512). All sequences used in this study are deposited in the European Nucleotide Archive (see Supplementary data 1 for accession numbers). The epidemiological data and linkage to sequences are available under restricted access due to their potentially identifiable nature. Access can be obtained by contacting the corresponding author (t.desilva@sheffield.ac.uk) after which a data sharing agreement will be organised. We will aim to respond to any requests within 10 working days.