Understanding SARS-CoV-2 transmission in higher education settings is important to limit spread between students, and into at-risk populations. In this study, we sequenced 482 SARS-CoV-2 isolates from the University of Cambridge from 5 October to 6 December 2020. We perform a detailed phylogenetic comparison with 972 isolates from the surrounding community, complemented with epidemiological and contact tracing data, to determine transmission dynamics. We observe limited viral introductions into the university; the majority of student cases were linked to a single genetic cluster, likely following social gatherings at a venue outside the university. We identify considerable onward transmission associated with student accommodation and courses; this was effectively contained using local infection control measures and following a national lockdown. Transmission clusters were largely segregated within the university or the community. Our study highlights key determinants of SARS-CoV-2 transmission and effective interventions in a higher education setting that will inform public health policy during pandemics.
The SARS-CoV-2 pandemic has caused substantial morbidity and mortality globally1,2. Universities have been considered conduits for transmission due to extensive social networks of young adults, many of whom live communally, and in-person teaching of large groups3. Outbreaks of SARS-CoV-2 have been observed in a number of higher education institutions, but the drivers for transmission in these settings are poorly understood4. It is speculated that infection dynamics are dependent on transmission chains involving student courses, residence, study year and social networks5. Understanding these dynamics is essential in order to devise effective infection control measures while minimising disruption to teaching, research and the mental health of students and staff6. Furthermore, while university students are less likely to develop severe COVID-19 disease, there is concern that university outbreaks could seed infections in more vulnerable populations, including staff, the local community, and upon returning home to older relatives7. Identifying possible sources of cross-transmission is therefore vital.
Although SARS-CoV-2 genome sequencing has clear utility to identify virus emergence and cryptic transmission8,9, no large-scale genomic studies in university settings have been conducted. The United Kingdom has an extensive community genomics surveillance programme through COG-UK10 which complements traditional contact tracing approaches by providing understanding of circulating viral populations.
We report the results of a genomic epidemiology study of SARS-CoV-2 across a complete term at the University of Cambridge (UoC). Importantly, these findings are from a study period prior to the established circulation of variants of concern and the availability of vaccination, with therefore fewer confounding factors. From 5 October to 6 December 2020, the UoC ran PCR-based symptomatic testing for all staff and students, and offered asymptomatic screening to 15,500 students living in university-managed accommodation. We therefore provide a unique study of SARS-CoV-2 infection that encompasses pre-symptomatic and asymptomatic students11. Positive samples from the UoC were sequenced and compared with systematic surveillance SARS-CoV-2 sequences from the local community. The results were analysed in conjunction with epidemiological data derived from the screening programme and national contact tracing. Overall, we describe introductions of SARS-CoV-2 into a higher education setting, the dynamics of transmission both within the university and between the university and the surrounding community, and the impact of local and national measures to control the spread of SARS-CoV-2 infections.
In total, 972 SARS-CoV-2 cases were identified among university students and staff over the course of term (5 October to 6 December 2020). High-quality genomes were generated from 446/778 (57.3%) positive cases from the university testing programme, from 107/266 (40.2%) cases identified through the Healthcare worker (HCW) screening programme (95 HCWs, 8 students, 4 university staff) and 104 patients identified by hospital testing (71 SARS-CoV-2 positive patients from Cambridge University Hospitals (CUH) and 33 from other medical facilities in Cambridgeshire). A further 797 local cases identified by community testing during the study period were present within the COG-UK dataset, of which 17 were identified as students, 7 as university staff and 26 as HCWs (Fig. 1). Of all identified SARS-CoV-2 cases from Cambridgeshire (university and community) during this period, 8.0% were sequenced (Supplementary Fig. 1).
SARS-CoV-2 lineages and transmission clusters
Over the 9-week term, 62 Pango lineages were identified across the university and community (Fig. 2a, c). In the university, 23 Pango lineages were identified, and 438/482 (90.9%) cases were from just 4 lineages (B.1.60.7, B.1.177, B.1.36, B.1.177.16), all of which were detected by the second week of term. Twelve lineages were only observed after the second week of term and accounted for 6.9% cases. By comparison, 57 lineages were identified in the local community over the same 9-week period. Viral genomes containing mutations in the spike protein that have been linked to decreased sensitivity to antibody-mediated immunity or impact viral transmission were observed in the university population: three sequences from the B.1.258 lineage containing the N439K mutation and ∆H69/∆V70; two cases of B.1.1.7/alpha variant and its associated mutations12; and 88 cases of B.1.177 with the A222V mutation13. Of these, Pango lineage B.1.1.7 is most reliably associated with increased transmission14; both cases of B.1.1.7 were amongst postgraduate students with no epidemiological links, during national lockdown, and failed to transmit further within the university.
In total, 198 putative transmission clusters were defined by CIVET (https://github.com/artic-network/civet). Only 8/36 clusters with university cases contained five or more university members (range 6–337), which together represented 91.3% of all university cases, signifying that the majority of introductions into UoC did not cause ongoing transmission. To further investigate the largest of these, cluster 1 described below, we identified groups of identical samples (0 SNP differences) which produced 19 additional clusters (a total of 34 clusters with >2 university cases) for further analysis.
Determinants of viral spread across the university
To determine transmission dynamics following introduction into the university, we performed a detailed investigation of the largest genomic cluster (Cluster 1), which accounted for 337/484 (69.6%) sequenced university cases (Fig. 3). This was widely dispersed across the university by the middle of term, affecting students from 29/31 colleges, 28 undergraduate courses and 208 households in university accommodation alone (Fig. 4).
Cluster 1 was classified as belonging to Pango lineage B.1.160.7. No mutations previously noted to be associated with increased transmissibility were observed in this lineage compared to other genomes in the study. Interrogation of the entire COG-UK dataset of samples from 2020 showed that this lineage was first identified in the UK on 4 October 2020, in Wales, before becoming predominantly sampled in the UoC (Fig. 3b). The B.1.160.7 lineage was not identified in the local community until term week 3 (19–25 October 2020). This was supported by the median estimate of the time to the most common recent ancestor of cluster 1, in comparison to its most closely related cluster from Cambridgeshire community isolates of 165 days (C.I. 127–207) prior to the start of term (6 October 2020). Together, these results suggest the university cases were introduced from outside Cambridgeshire. Additional analysis with A2B-COVID15, which uses genomic data alongside timing of infection data to evaluate plausibility of transmission between individuals, we showed that these sequences were consistent with a single introduction into the university (Fig. 3c).
National and university contact tracing data were used to identify the initial source of dispersion of this cluster. Ten students from the first two weeks of term reported visiting the same nightclub (venue A). Nine individuals either had an isolate from cluster 1 or (in the event that their sample did not yield a high-quality sequence) were household contacts of an individual with a sequenced cluster 1 isolate. No information was available for one student (Supplementary Fig. 5).
Transmission of cluster 1 was sustained from the first week of term until a national lockdown was enforced on 5th November. Students testing positive in the two weeks around lockdown reported common exposure events predominantly linked to nightclub venues (25/59 (42.4%) of exposures external to the university reported by 48 students). Venue A, identified above as the possible source of dispersion of this cluster at the start of term, was also the most common venue identified in the two weeks around lockdown (n = 16). 9/16 cases had sequences in cluster 1, and a further five individuals (where no sequence was available) were household contacts of sequenced cases in cluster 1 (Supplementary Fig. 6).
To determine the impact of lockdown and other control measures within the university, a birth-death skyline model16 was used to measure changes in the effective reproduction number (Re) within cluster 1. The model indicated an initial Re at the start of term that was slightly larger than 1, albeit with wide uncertainty (median 1.14; 95% HPD: 0.27–2.21 on 5 October). Over the next 2 weeks Re continued to rise (median 1.52; 95% HPD 0.94–2.22 on 15 October) followed by a subsequent gradual decline over the next 2 weeks (Fig. 5a). There was a rise immediately prior to the start of lockdown (median 1.55; 95% HPD 1.25–1.86 on 5 November), followed by a steep decrease thereafter (median 0.23; 95% HPD 0.07–0.41 on 19 November) (Fig. 5a), consistent with declining absolute numbers of SARS-CoV-2 infections seen during this time (Fig. 2c). The model estimated the median effective infectious period for individuals in the cluster at 3.03 days (95% HPD: 2.44–3.59 days) (Fig. 5b). As the model does not explicitly incorporate an incubation period and assumes that individuals cannot transmit after being sampled, the effective infectious period represents the mean time from infection until testing positive and assumes perfect infection control measures thereafter. Estimates of Re and the effective infectious period are robust to model parameterisations (Supplementary Figs. 8–10). Sampling proportion estimates largely overlap with empirical estimates based on the number of positive cases that were sequenced during each week (Fig. 5c). Although sampling proportion estimates are sensitive to the prior specifications, Re estimates are unaffected (Supplementary Fig. 11).
Transmission within university households
There was evidence of transmission of SARS-CoV-2 in student accommodation in 18/34 university clusters. In cluster 1, 169/337 (50.1%) students had a virus genome sequence identical to at least one other student living in the same or neighbouring household (sub-clusters within 0 SNPs ranging between 2 and 11 students).
The largest cluster associated with transmission in accommodation was cluster 2 (lineage B.1.36). By term week 3, this cluster involved 30 students, of which 24 (80%) lived in the same accommodation block in College A and 4 students lived in two separate households in the same college (Supplementary Fig. 12). Interventions from the university, supported by local public health authorities, included isolation of all households in the main accommodation block and individual screening offered to all students. Half of all cases in this cluster were diagnosed by asymptomatic screening. No further genomically-related isolates were identified after term-week 3, indicating a successful intervention, and cessation of transmission.
To quantify the importance of household transmission, a Reed-Frost Chain Binomial Model was employed to estimate the household attack rate. Using A2B-COVID15, we identified 265 households in which the data were consistent with only 1 introduction of SARS-CoV-2. The per household contact probability that an infected person passed on the virus to an uninfected individual within the same household was estimated at 7.8% (95% C.I. 6.9–8.7%).
Further genomic clusters where transmission between household members was implicated are outlined in Supplementary Table 1. They follow similar patterns, with groups of cases confined to a single college not leading to sustained transmission.
Other transmission routes among university members
In addition to household transmission, there was evidence of viral spread between students in the same course and year of study in 14/34 genomic clusters, with the highest proportion being students in their first year of study. In cluster 1, 203/337 (60.2%) students had an identical isolate to at least one other student studying the same course in the same year (cluster size range 2–14 students). Statistical modelling using data from cluster 1 across the term showed a bias towards infections being observed in first year students (p-value = 0.002) (Supplementary Fig. 13, model details in Supplementary Methods). Two further small clusters were comprised of postgraduate students working in the same university department. However, we were not able to determine the probable location of transmission in most cases: there is considerable overlap between course and household clusters, and complex social and study networks exist between students (illustrated in Supplementary Table 1, for example in clusters 3, 4 and 10). Of note, 23/34 clusters with 2 or more genomically linked cases in the dataset contained at least one university member that could not be epidemiologically linked with any other case in their cluster.
The number of SARS-CoV-2 sequences from university staff members were limited in comparison to students (n = 30). There was evidence of transmission between staff members working in the same department, college or ancillary role in four genomic clusters. Two clusters contained staff members who shared the same household. There are 8 clusters involving both university staff and students. However, epidemiological associations between these two groups could only be identified in one cluster: a shared household between a student and staff member working in separate university departments.
Transmission between the university and local community
We next sought to address the degree of transmission between the university and the local community. Two distinct phylogenetic approaches, shown in Fig. 2, demonstrate segregation of the majority of community and university cases into separate clusters and therefore a lack of substantial cross-transmission. 29/198 (14.6%) transmission clusters contained both university and community cases. Only six clusters contained five or more university cases and included three or more community cases.
To identify transmission clusters involving university and hospital (patient and healthcare worker) cases, we ran CIVET (https://github.com/artic-network/civet) separately with these cases for a focused phylogenetic analysis of this setting. Associations were identified between the university and hospital settings, with 17 clusters involving both university members and either patients or staff. Cluster 1 (69.6% of student cases), contained only 1 patient and 1 healthcare worker with no identifiable epidemiological link to students. The remaining 16 clusters comprised 133 individuals, including 26 patients, 55 hospital staff or their family members and 52 university members (including 18 staff and 15 clinical medical students). The second-largest cluster of university members (n = 21 university and hospital cases) included nine medical students, five healthcare workers and two patients. Phylogenetically, the medical students and one of the healthcare workers were closely linked (Supplementary Fig. 14) and analysis of these cases with A2B-COVID15 confirmed the plausibility of transmission. All 9 medical students were on clinical rotations at the time of diagnosis of the index case; 7/9 lived in neighbouring households in the same college and the remaining two were named contacts of the index student. Plausible transmission events between this group and the other cluster members were refuted using A2B-COVID (Supplementary Fig. 14).
To further investigate epidemiological associations in clusters involving university members and the local community, 1243/1455 of the cases sequenced over the sampling period were linked to national contact tracing data (excluding hospital cases). 219 (17.6%) cases reported 127 common exposure events. Cluster 1, representing 69.6% of cases within the university, included only 17/976 (1.7%) community cases; only one community case had a common exposure with a university student, dining at the same restaurant. No other epidemiological links were identified in all other genomic clusters. Transmission suspected in 19 epidemiologically linked clusters defined by common exposures was refuted by phylogenetic variation (i.e. identified in separate transmission clusters as defined by CIVET).
We report the first comprehensive and integrated epidemiological and genomic analysis of SARS-CoV-2 transmission in a higher education setting. Following a limited number of introductions, the majority of cases were linked to a single genetic cluster, that was likely to have dispersed across the university following multiple social gatherings at a nightclub. There was considerable transmission associated with student accommodation and student courses, but minimal evidence of transmission within departments, or between students and staff. We observe the great majority of transmissions occur either within the university or within the local community. Finally, we present evidence demonstrating the efficacy of university measures and national lockdown in reducing COVID-19 cases.
Nearly 70% of all university cases belonged to one genetic cluster (cluster 1), introduced into the UoC by the arrival of students and likely forming a single transmission chain. A nightclub was implicated as an important transmission event at the start of term and again prior to lockdown. This corroborates previous studies identifying such venues as a risk factor for substantial SARS-CoV-2 transmission17,18. We urge a cautious approach to the access of such venues during a SARS-CoV-2 pandemic, particularly in the context of a young susceptible student population.
Our data suggest a substantial change in case numbers and the effective reproduction number over the course of the term. This likely reflects a combination of changes in student behaviour and effective interventions to reduce transmission. Overall, we note that incidence and the effective reproductive number within the university are lower than in other higher education settings and the general UK young adult population during the study period19. We highlight a limited number of introductions and low lineage diversity in the university compared to the surrounding community. While the natural extinction of lineages is relatively common20, multiple genetically diverse clusters may be expected given the congregation of students from across the globe (international students make up 35% of students in college accommodation)11. The lack of diversity may reflect the impact of robust and widely implemented university infection control measures maintained throughout the term, full details of which are provided in the Supplementary Materials, but include social distancing, mask wearing and quarantine of international students at the beginning of term.
There was an initial rise in cases over the first two weeks, coinciding with the first week of term and university Freshers week. This is known to be a period of more intense social mixing between students in venues both inside and outside university premises. Between term weeks three and five there was a fall in the effective reproductive number, which coincides with both a reduction in social mixing and the identification of, and subsequent university measures to control, transmission events identified in college residences. In multiple clusters, transmission in student households was successfully interrupted through a combination of measures provided by the university, including rapid case identification through asymptomatic screening, readily available symptomatic testing, contact tracing and comprehensive support provided by colleges for cases and their contacts while in isolation. Further details, including the elaboration of the specific measures to control cluster 2, an outbreak associated with a large accommodation block described above, are provided in the Supplementary Materials. Although we have demonstrated that transmission between students in the same accommodation block is an important factor in the spread of SARS-CoV-2, we report a lower secondary household attack rate (7.8%) than that identified in domestic households (16.6–21.1%) and a lower than expected effective infectious period (3.0 days)21.
University measures may have been less successful in controlling transmission in settings outside colleges. There was a rise in the effective reproduction number coinciding with the announcement of a national lockdown on 31 October, to begin on 5 November 2020. This announcement prior to implementation of a major socially restrictive public health measure, alongside existing Halloween festivities, may have led to increased levels of behaviour associated with a higher risk of transmission. This supports either reducing the time from announcement to implementation of socially restrictive measures, or the need for a targeted public health campaign to limit high-risk activities where this is not possible. In addition, having identified considerable transmission between students on the same course, we suggest that further mitigation of viral spread may be obtained by implementing shared student accommodation based on university courses.
The national lockdown dramatically reduced case numbers within the university, at a faster rate than the local community, demonstrating high levels of compliance from our study population with an effective control strategy. Contemporary studies conducted elsewhere in the UK have demonstrated that adherence to COVID-19 prevention measures, such as national lockdown, are mixed22. Although young age is a risk factor for poor adherence, other associations are less common within the university population, such as having a dependent child in the household, financial hardship and working in a key sector. Although no direct incentives were provided to students, the expectation of individuals to adhere to rules was communicated widely in both national and university media. We also believe that the key to the successful implementation of lockdown was the additional support provided by the collegiate university, ranging from the practical provision of food and drink through to the pastoral and community support provided by established networks of staff, tutors and student representatives.
Finally, we observed limited transmission between the university and the local community. The largest university cluster, accounting for the majority of student infections, was largely phylogenetically distinct from community cases. Further, epidemiological evidence describing common exposures for community and university cases was sparse. However, clinical medical students were disproportionately represented within community clusters. This is an important epidemiological link between secondary care and the university; we highlight this group as being at-risk for both acquisition and transmission of SARS-CoV-2 and medical students should therefore be prioritised for interventions such as vaccination.
A combination of contact tracing and genomics was instrumental to understanding transmission within the university and with its surrounding population; notably in refuting transmission within epidemiologically linked clusters. We advocate for a combined genomic epidemiological approach to inform outbreak investigations as used in other settings8,23.
This study has a number of limitations. Incomplete sampling and subsequent sequence filtering in both the university and community should be considered when interpreting transmission; the asymptomatic and active case ascertainment in this study should mitigate this discrepancy. The lower community case ascertainment may result in unobserved transmission chains (such as those when assessing the introduction of Pango lineage B.1.160.7 into the university). Further, epidemiological links are dependent on self-reporting and therefore some data will be missing; whilst a lack of epidemiological association between groups in clusters is important and reassuring (such as between staff and students), it does not confirm a lack of transmission. We highlight shared student courses as a risk factor for transmission; this does not take into account the setting of transmission, i.e., during educational or social activities. Finally, the UoC is distinct in its collegiate structure with limited integration with the community; any generalisation of conclusions should be tempered by the study setting.
We present the first comprehensive integrated epidemiological and genomic evaluation of transmission of SARS-CoV-2 within a university. The insights gained will inform public policy regarding infection control measures in higher education settings. We find containment of transmission in student accommodation necessary to mitigate onward propagation. We highlight the importance of targeted public health measures towards nightclub venues to limit transmission. Critically, these findings are likely to be informative for future pandemic preparedness.
The COG-UK study protocol was approved by the Public Health England Research Ethics Governance Group (reference: R&D NR0195). Public Health England affiliated authors had access to identifiable Cambridgeshire community case data. This data was processed under Regulation 3 of The Health Service (Control of Patient Information) Regulations 2002- permitting the processing of confidential patient information for communicable disease and other risks to public health and as such, individual patient consent is not required. Other authors only had access to anonymised or summarised data. Ethical approval for the UoC asymptomatic COVID-19 screening programme was granted by the UoC Human Biology Research Ethics Committee (HBREC.2020.35) with informed consent gained from participants.
The UoC has ~23,000 students and 12,600 staff. The university is divided into 31 colleges and 150 departments, faculties and other institutions. Students belong to a college community, as well as being members of the university and an academic faculty/department. Colleges provide residential accommodation for approximately two thirds of students, either on campuses or in off-site housing, and offer social and sports activities, pastoral and academic support for each individual24. All colleges have membership from students across multiple courses. The university is based in the City of Cambridge (which has an estimated population of 123,90025), in the county of Cambridgeshire (estimated population 855,796 people in 201926) in the East of England.
Participants and samples
Samples were derived from university symptomatic testing and asymptomatic COVID-19 screening programmes between 5 October 2020 and 6 December 2020, covering the full term. Testing for all symptomatic students and staff was available on weekdays. The asymptomatic screening programme has been described in detail elsewhere11. In brief, screening was offered on a voluntary basis to all students residing in accommodation owned or managed by a college or the Cambridge Theological Federation. In total, 15,561 students were eligible to participate. To optimise testing efficiency, multiple swabs were pooled into the same tube of viral transport medium at the time of sample collection. Testing pools varied in size from 1 to 10 students, with each devised to include one or more student households as far as possible11. In this study, households are defined as individuals who share a kitchen, bathroom and/or lounge facilities. The members of any pool testing positive were re-tested using individual confirmatory PCR tests to confirm the result and identify the positive subject(s) (see Supplementary Methods for further details including infection prevention control measures). Only samples from individuals that were confirmed positive upon the re-testing were used for sequencing.
SARS-CoV-2 strains circulating in the local community were identified from the COG-UK dataset for Cambridgeshire. These data were derived from local community samples from non-hospitalised, symptomatic individuals, who requested a free diagnostic test via national community testing. Other samples were derived from patients treated at three Cambridgeshire hospital trusts: Cambridge University Hospitals NHS Foundation Trust (a teaching hospital providing secondary care services for Cambridge and the surrounding area as well as tertiary referral services for the East of England and surge capacity for COVID-19); Royal Papworth Hospital NHS Foundation Trust (specialist heart and lung hospital, also providing surge capacity for COVID-19); Cambridgeshire and Peterborough NHS Foundation Trust (provider of community, mental health and learning disability services in Cambridgeshire). Hospital samples were obtained from both asymptomatic screening and those exhibiting COVID-19 symptoms. Finally, samples were derived from the asymptomatic HCW programme at Cambridge University Hospitals27.
Positive samples from UoC testing with a PCR cycle threshold value ≤33 were selected and sequenced using the GridION platform (Oxford Nanopore). All Cambridgeshire samples sequenced between 24th September and 21st December 2020 were included to overlap with the university term. Samples from the local Cambridgeshire community and hospital cases (described above) were collected as part of national SARS-CoV-2 testing, and sequenced at one of seventeen COG-UK sequencing sites (further details in Supplementary Methods). The samples were prepared using either the ARTIC28 or veSeq29 protocols, and were sequenced using Illumina or Oxford Nanopore platforms. Genomic data were filtered to exclude sequences with >5% Ns and those of spuriously low file sizes (<29 KB). Genomes were aligned with minimap230 to the Wuhan Hu-1 reference genome (MN908947.3), collected December 2019. All samples were processed through COVID-CLIMB pipelines31,32. Protocols are available at https://github.com/COG-UK.
Maximum likelihood phylogenetic trees were estimated using IQ-TREE (version 2.1.2 COVID-edition)33 and rooted using Wuhan Hu-1 (MN908947.3) as an outgroup. Trees were constructed using the GTR + Γ substitution model34, as determined by ModelFinder35. Branch support statistics were generated using the ultrafast bootstrap method36. TempEst37 was used to explore the temporal signal in the data. Trees were visualised, explored, and labelled with associated metadata using Microreact38 to identify epidemiological links supported by the genomic data. Specified mutations were identified using type_variants (https://github.com/cov-ert/type_variants). Possible transmission clusters were defined by extracting phylogenetic neighbourhoods identified using the CIVET tool (version 2.1.0) on 11 January 2021 (https://github.com/artic-network/civet). In selected clusters, further evaluation was conducted using A2B-COVID15. A2B-COVID evaluates data from individuals in a pairwise manner. Using viral genome sequences from two individuals, alongside data describing the timing of infection, it evaluates whether or not these data are consistent with a hypothesis that SARS-CoV-2 was transmitted directly from one individual to the other; data from each pair is described as being either consistent, borderline, or unlikely to have been observed given this hypothesis. Where indicated, collapsed nodes from trees generated from CIVET were inspected to visualise data in the context of the COG-UK national database (https://www.cogconsortium.uk/). For further evaluation of transmission in the largest cluster identified by CIVET, pairwise SNP differences between sequences were determined using SNP-dist (https://github.com/tseemann/snp-dists/releases/tag/v0.7.0).
Global Pango Lineages39 were assigned to each genome using Pangolin (https://github.com/cov-lineages/pangolin/releases/tag/v2.1.6) with analyses performed on COVID-CLIMB32 (further details in Supplementary Methods).
Molecular clock and phylodynamic analyses
BEAST v1.10.440 was used to perform a time-scaled phylogenetic analysis using an exponential growth coalescent treeprior and a GTR + Γ substitution model including all university and community high-quality genomes from the study period. As there was a lack of clear temporal signal in our dataset due to the relatively short time period analysed, the substitution rate was fixed to 8 × 10−4 substitutions per site per year (s/s/y) under a strict clock model in line with previous SARS-CoV-2 analyses13,41,42,43,44. Two chains of 100 million iterations were run independently to ensure convergence to the correct posterior distribution. Convergence was assessed using Tracer45, and 10% of states were removed to account for burn-in. Finally, a maximum clade credibility (MCC) tree was generated using TreeAnnotator.
To estimate the effective reproduction number (Re) and infectious period of SARS-CoV-2 over the term, a dominant clade (representing 69.6% of all university genomes) was selected and all community genome sequences that cluster with it incorporated, resulting in a total of 354 genomes. A Bayesian birth-death skyline (BDSKY) model16 was employed using BEAST v2.646. A GTR + Γ substitution model was used along with a strict clock model, placing a lognormal prior with mean 8 × 10−4 s/s/y (in real space) and standard deviation 0.1 on the clock rate. A lognormal prior with mean 0 and standard deviation 1 was placed on Re and a Beta prior with ɑ = 5 and β = 5 was placed on the sampling proportion. Re was parameterised into 20 epochs, equidistantly spaced between the origin time and the most recent sequence collection date. The sampling proportion was fixed to 0 before the first week of term and estimated for each week thereafter. The rate at which infected patients become non-infectious was assumed to be constant and a lognormal prior with mean 48.7 years−1 (in real space) and standard deviation 0.25 was placed on it, resulting in a prior mean effective infectious period between ~5 and ~15 days. To test the robustness of the posterior estimates different parameterisations were used for Re and the sampling proportion, and the sampling proportion prior was varied. Further details are provided in the supplementary methods. To test the robustness of posterior estimates to the clock rate prior all analyses were repeated using a lognormal prior with mean 1 × 10−3 s/s/y (in real space) and standard deviation 0.1 on the clock rate. Finally, to test the assumption of a strict clock model, analyses were repeated using an uncorrelated lognormally distributed relaxed clock model47. In these analyses the 95% HPD interval of the coefficient of variation of the clock rate did not exclude 0, indicating poor support for a relaxed clock model in this dataset. Furthermore, estimates of the BDSKY model parameters did not differ significantly from estimates under a strict clock model. Therefore, we only show results under a strict clock model. For all models three chains of 200 million iterations were run independently. Convergence was assessed using the R-package coda48, and 10% of states were removed to account for burn-in. MCC trees were generated using TreeAnnotator.
Household attack rates
A2B-COVID15 was used to exclude households for which the sequence and epidemiological data were inconsistent with a single viral introduction to the household. A chain binomial model was then used to estimate the probability that an infected person transmitted the virus to an uninfected person within the same household (further details in supplementary methods).
University student demographic data were derived from the UoC student electronic record system CamSIS, and household structure and membership data from the UoC asymptomatic screening programme. To identify university affiliated cases (students and staff) and hospital staff accessing the national SARS-CoV-2 testing service, Second Generation Surveillance System (SGSS) and contact-tracing data provided by NHS Test and Trace (T&T) data were interrogated. Epidemiologically linked common exposures for students, university staff and the local community were identified through T&T data. Common exposures were defined by T&T as locations or events that two or more people testing positive for COVID-19 visited in the same two to seven day period before symptom onset or positive test. Additional contact tracing information was also provided by the UoC COVID helpdesk. These data were compared with observed phylogenetic clusters to determine potential sources of transmission and determine the extent of transmission between the university and community.
Epidemiological data from UoC were initially compiled in Microsoft Azure SQL and Excel 2013 (Microsoft) and analysed in STATA 14.2 (College Station, TX, USA). Further data manipulation, statistical analysis and figure generation was undertaken with RStudio (version 1.3.1093) using R (version 4.0.2). Network diagrams were produced with R package iGraph (v1.2.6).
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The Assembled/consensus genomes generated in this study have been deposited in the GISAID49 database and raw reads are available from European Nucleotide Archive (ENA)50 under accession PRJEB37886. Pooled sample sequence raw reads and assembled sequences are deposited in the NCBI Sequence Read Archive Database (SRA; https://www.ncbi.nlm.nih.gov/sra) under the BioProject accession number PRJNA779279.
ENA and Genbank accession codes for individual sequences used in this study are available in supplementary materials (Supplementary Data 1 and 2). All genomes, phylogenetic trees and basic metadata are available from the COG-UK consortium website (https://www.cogconsortium.uk/data). Limited public metadata, analysis files, and processed genomic data for this work are available from GitHub at https://github.com/COG-UK/camb-uni-phylo/ (https://doi.org/10.5281/zenodo.564335451), which also contains a list of ENA and Genbank study sequence accession numbers for this study. For confidentiality reasons, extended metadata52 are under restricted access for confidentiality reasons and in line with study ethics; requests for access should be directed to corresponding authors and specifically for Public Health England data, to the Public Health England office of data release (https://www.gov.uk/government/publications/accessing-public-health-england-data/about-the-phe-odr-and-accessing-data) with an estimated 60 working days turnaround time. Processed metadata generated for figures in this study are provided in the Source Data file. Source data are provided with this paper.
Custom code used in this analysis is available at https://github.com/COG-UK/camb-uni-phylo/. Please direct further queries to the corresponding authors.
Zhou, F. et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395, 1054–1062 (2020).
Yang, J. et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J. Infect. Dis. 94, 91–95 (2020).
Amoolya Vusirikala, H. W. et al. Gayatri Amirthalingam. Seroprevalence of SARS-CoV-2 Antibodies in University Students: cross-sectional study, December 2020, England. (2021).
Yamey, G. & Walensky, R. P. Covid-19: re-opening universities is high risk. BMJ 370, m3365 (2020).
Group, C.s.T.F. Risks associated with the reopening of education settings in September. 2021 (2020).
Sahu, P. Closure of universities due to coronavirus disease 2019 (COVID-19): impact on education and mental health of students and academic staff. Cureus 12, e7541 (2020).
Education, T.a.F.G.o.H.E.F. Principles for managing SARS-CoV-2 transmission associated with higher education - 3 September 2020. Vol. 2021 (2020).
Meredith, L. W. et al. Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study. Lancet Infect. Dis. 20, 1263–1271 (2020).
Moreno, G. K. et al. Severe acute respiratory syndrome coronavirus 2 transmission in intercollegiate athletics not fully mitigated with daily antigen testing. Clin Infect Dis. 73, S45–S53 (2021).
firstname.lastname@example.org, C.-G.U. An integrated national scale SARS-CoV-2 genomic surveillance network. Lancet Microbe 1, e99–e100 (2020).
Ben Warne, J. E. et al. Feasibility and efficacy of mass testing for SARS-CoV-2 in a UK university using swab pooling and PCR. Preprint at https://www.researchsquare.com/article/rs-520626/v1 (2021).
Andrew Rambaut, N. L. et al., On behalf of COVID-19 Genomics Consortium UK (CoG-UK). Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Vol. 2021 (2020).
Hodcroft, E. B. et al. Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020. Preprint at https://www.medrxiv.org/content/10.1101/2020.10.25.20219063v3 (2020).
Volz, E. et al. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature 593, 266–269 (2021).
Illingworth, C. J. R. et al. A2B-COVID: a method for evaluating potential SARS-CoV-2 transmission events. Preprint at https://www.medrxiv.org/content/10.1101/2020.10.26.20219642v2 (2020).
Stadler, T., Kuhnert, D., Bonhoeffer, S. & Drummond, A. J. Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Natl Acad. Sci. USA 110, 228–233 (2013).
Muller, N. et al. Severe acute respiratory syndrome coronavirus 2 outbreak related to a Nightclub, Germany, 2020. Emerg. Infect. Dis. 27, 645–648 (2020).
Choi, H., Cho, W., Kim, M. H. & Hur, J. Y. Public health emergency and crisis management: case study of SARS-CoV-2 outbreak. Int. J. Environ. Res. Public Health 17, 3984 (2020).
Group, T.C.s.T.a.f. Children’s Task and Finish Group: Paper on higher education settings. Vol. 2021 (2021).
du Plessis, L. et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021).
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 368, 489–493 (2020).
Smith, L. E. et al. Adherence to the test, trace, and isolate system in the UK: results from 37 nationally representative surveys. BMJ 372, n608 (2021).
Seemann, T. et al. Tracking the COVID-19 pandemic in Australia using genomics. Nat. Commun. 11, 4376 (2020).
Cambridge, U.o. How the University and Colleges work. https://www.cam.ac.uk/about-the-university/how-the-university-and-colleges-work (2013).
Statistics, O.f.N. 2011 ONS Census. https://www.cambridge.gov.uk/media/1170/census-2011-cambridge-data.pdf (2011).
Statistics, O.f.N. Cambridgeshire and Peterborough Population Overview Report. https://cambridgeshireinsight.org.uk/population/ (2019).
Rivett, L. et al. Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission. Elife 9, e58728 (2020).
Quick, J. nCoV-2019 sequencing protocol v3 (LoCost) V.3. Vol. 2021 (2020).
Bonsall, D. et al. A comprehensive genomics solution for HIV surveillance and clinical monitoring in low-income settings. J. Clin. Microbiol. 58, e00382 (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Nicholls, S. M. et al. CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 22, 196 (2021).
Connor, T. R. et al. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community. Micro. Genom. 2, e000086 (2016).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Tavare, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. math. life sci. 17, 57–86 (1986).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Minh, B. Q., Nguyen, M. A. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).
Rambaut, A., Lam, T. T., Max Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
Argimon, S. et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Micro. Genom. 2, e000093 (2016).
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Ghafari, M. et al. Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Mol. Biol. Evol. msac009. https://doi.org/10.1093/molbev/msac009 (2022).
Vaughan, T. G., Sciré, J., Nadeau, S. A. & Stadler, T. Estimates of outbreak-specific SARS-CoV-2 epidemiological parameters from genomic data. Preprint at https://www.medrxiv.org/content/10.1101/2020.09.12.20193284v1 (2020).
Nadeau, S. A., Vaughan, T. G., Scire, J., Huisman, J. S. & Stadler, T. The origin and early spread of SARS-CoV-2 in Europe. Proc. Natl. Acad. Sci. USA 118, e2012008118 (2021).
Geoghegan, J. L. et al. Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand. Nat. Commun. 11, 6351 (2020).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Bouckaert, R. et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 15, e1006650 (2019).
Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: convergence diagnosis and output analysis for MCMC. R. N. 6, 7–11 (2006).
Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 22, 30494 (2017).
Toribio, A. L. et al. European Nucleotide Archive in 2016. Nucleic Acids Res. 45, D32–D36 (2017).
Aggarwal, D. et al. Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission. Github https://doi.org/10.5281/zenodo.5643354 (2021).
Griffiths, E. J. T. et al. The PHA4GE SARS-CoV-2 contextual data specification for open genomic epidemiology. Preprint at https://www.preprints.org/manuscript/202008.0220/v1 (2020).
Authors A.S.J., W.H. and T.F., and authors L.d.P. and V.H. contributed equally. We thank members of the COVID-19 Genomics Consortium UK and NHS Test and Trace contact tracers for their contributions to generating data used in this study. We thank the Sanger Covid Team for assisting with Samples and Logistics. We are grateful to all students and staff at the University of Cambridge who have contributed to the COVID-19 response during Michaelmas Term. We are grateful to all staff members of the Cambridge COVID-19 Testing Centre for generating qPCR data. D.A. is a Wellcome Clinical PhD Fellow and gratefully supported by the Wellcome Trust (Grant number: 222903/Z/21/Z). B.W. receives funding from the University of Cambridge and the National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre (BRC) at the Cambridge University Hospitals NHS Foundation Trust. I.G. is a Wellcome Senior Fellow and is supported by the Wellcome Trust (Grant number: 207498/Z/17/Z and 206298/B/17/Z). E.M.H. is supported by a UK Research and Innovation (UKRI) Fellowship: MR/S00291X/1. C.J.R.I. acknowledges Medical Research Council (MRC) funding (ref: MC_UU_00002/11). NJM is supported by the MRC (CSF MR/P008801/1) and NHSBT (WPA15-02). A.J.P. gratefully acknowledge the support of the Biotechnology and Biological Sciences Research Council (BBSRC); their research was funded by the BBSRC Institute Strategic Programme Microbes in the Food Chain BB/R012504/1 and its constituent project BBS/E/F/000PR10352, also Quadram Institute Bioscience BBSRC funded Core Capability Grant (project number BB/CCG1860/1). L.d.P. and O.G.P. were supported by the Oxford Martin School. This research was supported by the NIHR Cambridge BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. The COVID-19 Genomics UK Consortium is supported by funding from the MRC part of UK Research & Innovation (UKRI), the National Institute of Health Research and Genome Research Limited, operating as the Wellcome Sanger Institute. The Cambridge Covid-19 testing Centre is funded by the Department of Health and Social Care, UK Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. For the purpose of Open Access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
R.H. is an employee of AstraZeneca AB. The remaining authors declare no competing interests.
Peer review information
Nature Communications thanks Sébastien Calvignac-Spencer, Joep de Ligt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Aggarwal, D., Warne, B., Jahun, A.S. et al. Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission. Nat Commun 13, 751 (2022). https://doi.org/10.1038/s41467-021-27942-w
This article is cited by
Archives of Public Health (2022)
Genomic epidemiology of SARS-CoV-2 in a university outbreak setting and implications for public health planning
Scientific Reports (2022)