Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19), is persisting endemically in the UK with long-term implications for public health planning and preparedness. As at 2nd June 2022, 73% of the Scottish population aged 12 and over have received three doses of COVID-19 vaccine1. However, lower population immunity among children, adolescents and younger adults2, the occurrence of infection despite vaccination3, and the emergence of vaccine escape variants4,5 threaten the long-term ability of the national vaccination programme to suppress the spread of infection and prevent disease.

With the exception of the first SARS-CoV-2 epidemic wave in 2020, cases among children, adolescents, and young adults have represented a high fraction of confirmed cases in Scotland. Younger age groups may contribute to seasonal patterns of COVID-19 in the long-term, with annual outbreaks in educational settings a well-recognised feature of acute respiratory infections in temperate climates6. Although we note that large and frequent COVID-19 clusters are also associated with settings involving older adults such as leisure, sporting events, and occupation7,8,9.

With respect to higher education settings, the risk of respiratory infection outbreaks have been brought to the forefront by COVID-19, with the implementation of social distancing and remote learning to limit cases10. Universities present a unique epidemiological scenario for study, with the movement of students to and from domicile locations at the start and end of academic terms presenting a risk for introducing viruses into student populations and for spillover into local communities11. Theoretical models suggest that SARS-CoV-2 will more easily transmit in denser populations depending on vaccination rates, adherence to social distancing, and effectiveness of contact tracing and isolation measures12,13. With a potential increased likelihood of asymptomatic infection14 and lower vaccination coverage1, student populations are at risk of seeding outbreaks among their social groups as well as exposing the local community and non-term-time contacts. Reports highlighted the potential for a large number of cases in higher education settings, particularly in association with densely populated residential halls15,16.

Expansive SARS-CoV-2 genomic surveillance conducted in the UK, with over 300,000 whole genome sequences generated by June 2022 in Scotland alone, provides a means to characterise outbreak risks in a setting-specific manner. In combination with epidemiological data, genomic surveillance has enabled fine-grain characterisation of SARS-CoV-2 outbreaks in other semi-closed settings such as hospital wards17,18 and care homes19. To date one study in the UK has taken such an approach to understand SARS-CoV-2 outbreaks in a tertiary education setting15. This study identified persistent transmission among students, though with little evidence of transmission to the local community. However, the social and community contexts of universities in the UK differ, such that patterns of spread within a collegiate university, a single institution organised into self-governing colleges with associated accommodation, may not be directly applicable to universities with a more centralised organisational structure.

Here we use genomic epidemiology to investigate a large cluster of cases among University of Glasgow (UoG) students in Scotland, UK, arising in the autumn of 2020 at the start of the 2020/21 academic year20. At this point reported case numbers were relatively low, although with the second epidemic wave in Scotland already underway21. Our study provides a genomically-informed insight into epidemiological risks associated with SARS-CoV-2 outbreaks in a university accommodation setting in which students were resident in halls integrated with the local community in the city of Glasgow.

Our study contributes to a comprehensive understanding of the role that residential accommodation can play in the spread of SARS-CoV-2, and highlights the success of rapidly implemented control measures on mitigating any subsequent impact on the wider university population and the local community. Given the current lower 3rd dose vaccine uptake among young adults in Scotland (as at June 2022)1, and in the event of a novel SARS-CoV-2 variant of concern, there is a need for public health planning and policy formation to consider the higher educational setting, particularly in preparation for winter seasons. We propose that consideration is given to the varying accommodation, academic, and social contexts across student populations.

Results

Outbreak investigation

As of the 2020/21 academic year, the UoG hosts approximately 33,000 students from more than 140 countries, with 97% of undergraduate students aged 16–29 years, and with 58% of the 6100 new undergraduate students domiciled in Scotland22. The university provides multiple residential halls predominantly housing undergraduate students23.

A cluster of COVID-19 student cases associated with two geographically distinct residential halls was identified by the UoG and the NHS Greater Glasgow and Clyde (NHSGGC) Public Health Protection Unit at the start of the 2020/21 academic year, immediately following Freshers’ week (12th to 18th September) (Fig. 1a). The outbreak peaked and was subsequently curtailed within around seven days of being identified, following the implementation of non-pharmaceutical control measures commonly in use at the time for controlling SARS-CoV-2; including infection control, contact tracing, increased availability of testing, and support for students to enable self-isolation.

Figure 1
figure 1

(a) Numbers of University of Glasgow student cases identified from contact tracing information over time, and (b) for the subset with sequenced samples, according to university residential halls or private accommodation. (c) Frequency distributions of SARS-CoV-2 lineages and (d) unique sequences identified among the student cases.

A total of 1039 student SARS-CoV-2 positive cases were identified, with 616 (59.3%) resident at one of six university residential halls spanning 13 postcode locations. The remaining 423 (40.7%) students were deemed to reside in private accommodation (see “Methods”). The median case age was 18 years (IQR 18–19 years; range 16–51 years), consistent with a majority undergraduate population, and 57.2% (n = 591) were female consistent with a known gender skew among the wider student population24. Of the 616 student cases associated with residential halls, 519 (84.2%) had respiratory specimen dates from 19th September to 26th September 2020 spanning the residential halls cluster. Based on these data, the student cluster included around 9% of the new undergraduate population overall. SARS-CoV-2 whole genome sequences were available for 220 cases, with specimen dates spanning 5th September to 18th November 2020; 73.2% were from students residing in university residential halls with specimen dates from 19th September to 26th September 2020 (Fig. 1b). The median age among sequenced cases was 18 years (range 17–25 years, IQR 18–19 years) and 138 (62.7%) were female.

SARS-CoV-2 genetic diversity

Among the 220 student cases, multiple global lineages (Pango nomenclature)25,26 were detected (Fig. 1c), the most common being B.1 (n = 114, 52.8%), B.1.1.315 (n = 41, 19.0%), B.1.177 (n = 30, 13.9%) and B.1.177.11 (n = 18, 8.3%). Seven further Pango lineages were detected at low numbers (n = 13, 5.9%). A total of thirty-nine distinct SARS-CoV-2 sequences were identified (Fig. 1d).

Putative transmission clusters were identified from 214 high quality SARS CoV-2 genome sequences (see “Methods”) using the A2B-COVID software package17. We use the term ‘transmission cluster’ to denote a set of potentially linked cases consistent with having arisen from a single introduction into the student population (see “Methods”; Supplementary Fig. S1, supplementary material). Of the 214 high quality sequences, 157 were obtained from students resident in halls with specimen dates spanning 19th to 26th September 2020, providing 30% coverage of the student halls COVID-19 cluster. Four large transmission clusters were identified, comprising 191 sequences; mapping these onto a phylogenetic tree identified them with the four clades (highlighted in Fig. 2). Among the remaining 23 sequences, three further putative transmission clusters were found, each comprising two or three cases, alongside 16 sequences that were not positioned closely with other transmission clusters.

Figure 2
figure 2

Phylogenetic tree of high quality genome sequences collected from University of Glasgow students. The reference sequence Wuhan Hu-1 (GenBank accession MN90894) was included as an outgroup. Four notable clades of sequences were identified, corresponding to cases that were putatively linked via direct transmission, as identified by the A2B-COVID software package. Shades of gray indicate three other sets of putatively linked cases. The phylogenetic analysis was conducted using iQ-Tree238, while the figure was made with iTOL44. The tree scale indicates the fraction of sites in the genome at which substitutions were observed. The tree scale bar denotes nucleotide substitutions per site.

While each putative transmission cluster is consistent with describing a distinct introduction into the student population, a more conservative estimate was made that at least 11 distinct introductions of SARS-Co-2 had occurred. To derive this, each transmission cluster and any sequences descended from it was counted as a single introduction. Further, the sequences intermediate to transmission clusters 2 and 3 were regarded as potentially arising from a single introduction (Fig. 2). In either case, the majority of introductions of SARS-CoV-2 into the university failed to cause substantial onward transmission among students.

Genomic evidence is consistent with introductions from within and outside Scotland

Determining the most recent common ancestor of the sequences associated with the four main transmission clusters positioned them within more expansive UK-wide phylogenetic clades circulating at that time. These clades combined both student and community sequences and revealed potentially distinct origins for the introductions to the University forming four main transmission clusters. Investigating the proportion of genome sequences falling into each of the four clades by regional location on a weekly basis, we found differences in the proportional representation of each clade when comparing Scotland with other UK regions in the period before the UoG residential halls cluster (Fig. 3).

Figure 3
figure 3

Potential origins of major SARS-CoV-2 clades identified in the University of Glasgow (UoG) student population. Numbers of UoG student sequences associated with each major clade (left column) are shown alongside data describing the associated UK-wide clades. Figures show the proportion of SARS-CoV-2 sequences collected each week in Scotland (middle column) and in the rest of the UK (right column) that were associated with each student clade, coloured by the regions in which the clade-associated sequences were collected. Total numbers of sequences informing this analysis were as follows: Clade 1 n = 412; Clade 2 n = 255; Clade 3 n = 541; Clade 4 n = 4006.

Recent information from the Higher Education Statistics Agency estimates that around 84% of Scottish university students from within the UK are domiciled within Scotland (Supplementary Table S1, supplementary material). This compares to between 35 and 63% of students in England domiciled within the region of their higher education provider, 62% in Wales, and 93% in Northern Ireland. For UoG students specifically, 81% were domiciled in Scotland, with the percentage of students resident across the remaining UK regions ranging from 0.4 to 3% (see Supplementary Table S2, supplementary material). Employing a simple model taking into consideration the high percentage of UoG students domiciled within Scotland (Supplementary Tables S1S2, supplementary material), we estimated a relatively high probability of Scottish relative to English origin for clades 1 and 3 (90:10 and 99:1 respectively), although these clades were not notably prevalent in Scotland in the period before the residential halls cluster (Fig. 3). Evidence was inconclusive for clades 2 and 4 (probability ratio of 60:40 in both cases) (see “Methods”).

Sequences from five residential halls (43 cases), and private accommodation (5 cases), were genetically indistinguishable from sequences identified from outbreaks associated with six other Scottish universities (Supplementary Fig. S4, supplementary material). These data may point towards acquisition of infection in Scottish domicile locations prior to the new academic term, as is the apparent case for at least two of the four clades.

The small number of UoG-outbreak-associated clades detected following the outbreak is consistent with this group of student cases having no long-term impact on the persistence of SARS-CoV-2 in the Scottish community.

Genetic diversity across halls suggests a high degree of student mixing

Sequences from each of the four major transmission clusters were found distributed across multiple residential halls and among students resident in private accommodation (Fig. 4). This picture is consistent with student-to-student transmission, either within residential halls and/or alongside common sources of exposure. The observation of mixed viral populations within most residential halls is consistent with each having had multiple SARS-CoV-2 introductions. Observations of viruses from different transmission clusters were seen in rapid succession within halls, making it difficult to identify the location of the first case in each and suggesting extensive mixing between students prior to the outbreaks. The rapid drop in cases suggests that transmission within these halls was quickly contained.

Figure 4
figure 4

Dates of sequenced PCR-confirmed cases of SARS-CoV-2 infection identified in University of Glasgow students resident in halls or private accommodation, coloured by transmission clade. Data is shown for residential halls in which a minimum of four cases were observed. The identification of viruses from multiple transmission clusters in many of the residential halls suggests that there were multiple introductions of SARS-CoV-2 into each hall.

Cases infected with SARS-CoV-2 other than the four major clades were rarely associated with residential halls, with the vast majority resident in private accommodation (Fig. 4).

Evidence suggests a short-lived impact on local community case incidence

An analysis of student and community sequences showed considerable intermixing of cases, with multiple subclades of a constructed phylogenetic tree spanning both populations (Fig. 5). Sequences used for this analysis comprised the 112 student sequences from transmission cluster 1, and additionally the 104 sequences in the associated UK-wide phylogenetic clade that originated within NHSGGC. While our data are insufficient to derive accurate rates or directionality of transmission, the data are consistent with ongoing mixing of students with others in the local community, though with no long-term impact in terms of persisting chains of transmission.

Figure 5
figure 5

Phylogenetic relationship between student (red) and community (blue) sequences associated with transmission clade 1. Shades indicate the week in which sequences were collected, the lightest colour indicating dates before and including the week of 14th September, and the darkest colour indicating dates from the 5th October onwards. Community sequences describe those identified in NHS Greater Glasgow and Clyde sharing a common ancestor with the student sequences. Student and community cases were observed in multiple subclades of the tree, indicating ongoing transmission between the students and local community. Phylogenetic analysis was conducted using iQ-Tree238, while the figure was made with iTOL44.

Age-stratified proportions of SARS-CoV-2 infections identified among PCR-tested individuals in the City of Glasgow (excluding UoG students) showed a rise in cases among adults aged 17–24 years coinciding with the start of the UoG residential halls cluster (Fig. 6). Furthermore, the age distribution of sequenced community cases identified in NHSGGC falling into UK-wide clades associated with the four major student transmission clusters showed a similar bias towards 17–24 year olds, consistent with mixing of students and the local community occurring predominantly among this age group (Fig. 7). The rise in the proportion of positive tests among 17–24 year olds was also seen on average across the rest of Scotland (Fig. 6), suggesting the burden of cases in this age group, with the potential impact of student populations at the start of term, was not unique to the City of Glasgow.

Figure 6
figure 6

Age-stratified 7-day rolling proportions of PCR-confirmed cases of SARS-CoV-2 infection (test positivity) among all tested individuals in the City of Glasgow community [excluding University of Glasgow (UoG) students], and in comparison with the rest of Scotland, excluding the City of Glasgow. The vertical lines indicate the start of the UoG student halls cluster around 19th September 2020.

Figure 7
figure 7

Age distribution of SARS-CoV-2 cases among residents of NHS Greater Glasgow and Clyde with sequences that phylogenetically clustered with each University of Glasgow (UoG) student transmission clade. UoG student cases were excluded. Numbers of sequences informing this analysis were as follows: Clade 1 n = 163, Clade 2 n = 27, Clade 3 n = 362, Clade 4 n = 106.

Discussion

We combined genome sequence data with epidemiological information to investigate a cluster of COVID-19 cases associated with UoG student residential halls. These cases occurred prior to rollout of the UK’s COVID-19 vaccination programme and the use of lateral flow device tests for regular asymptomatic self-testing by the students. We evaluated frequency and origins of SARS-CoV-2 introductions into the student population, the likelihood of transmission among the students, and impact on the local community. The integration of genomic, evolutionary, and epidemiological approaches provides a deep insight into the nature of viral spread within a university accommodation context.

Our study highlights several important features of public health significance. Firstly, the increase in SARS-CoV-2 infections among the student population primarily involved university accommodation. The sudden rise in student cases emerged shortly after social events linked to “Freshers’ week” at the start of the university term, providing a strong indication that the outbreaks spread via induction events and/or other social activity and gatherings27.

Secondly, while there was evidence of at least 11 introductions of SARS-CoV-2 into the university student population, our data were consistent with only four leading to substantial onward transmission among students, each consistent with a separate outbreak. Our data suggests the presence of large variations in the numbers of individuals infected by each introduction, potentially attributable to superspreading events whereby a small number of individuals cause a disproportionate number of cases28. Although the sequencing of UoG student cases was incomplete, with around 30% coverage of the residential halls cluster, we can be reasonably confident that all major transmission clusters were detected.

Thirdly, the data were strongly indicative of introductions into the university from Scotland for at least two of the four clades. Our findings highlight that although the potential for virus importations into Scotland through long-distance student movements remains, the risk of new variants being introduced to Scotland at the start of the academic year would appear low in comparison to other UK regions given the high proportion of Scottish students domiciled within Scotland29. The risk of importing SARS-CoV-2 from countries outside of the UK was not directly addressed by our study owing to a lack of robust data at a global level.

Fourthly, our study highlights the rapid spread of SARS-CoV-2 within and between university accommodation. Halls of residence provide household-like conditions enabling direct and indirect transmission of SARS-CoV-2 via respiratory (aerosol and droplet) and fomite routes. Accommodation at the UoG comprises predominantly off-campus accommodation, representing a mixture of self-contained flats, shared kitchen spaces and bathroom facilities, and common room/study areas23. The identified transmission clusters were consistent with student–student transmission linked to household-like conditions and/or shared facilities within residential halls. Social gatherings may also be a conduit for the apparent transmission between households and likely contributed to the multiple viral populations within halls, with mixing of individuals returning to the university from various locations, and with compliance to social distancing expected to be poor at such events.

Despite the scale and speed at which the outbreak grew, the rapid curtailment of the outbreaks demonstrates that the swift action of the university and local Public Health Protection Unit, together with an apparent high degree of student compliance, prevented any further rise in case numbers and limited any impact on the local community. We note that a decline in proportional sequencing coverage in Scotland in the period subsequent to the UoG outbreaks (from an average daily coverage of ~ 24% in September 2020 to ~ 4% from mid-October 2020) limits the ability to detect associated variants circulating at low prevalence, and precludes a direct comparison of observed clade-specific case incidence rates between Scotland and England. However, we can be reasonably confident that variants associated with any significant outbreaks or long-lasting chains of transmission would be detectable among the community sequences available, given that sequencing was conducted daily over an extended time frame. It is likely that self-isolation by students was the predominant factor leading to the rapid end of the student halls-associated outbreaks, although our study did not formally evaluate control measure effectiveness.

At the time of the UoG outbreaks, most educational institutes in the UK had undergone emergency closure of in-person teaching in response to the COVID-19 epidemic, resulting in prolonged periods of remote learning. However, significant outbreaks arose in the university setting, albeit during a period of relatively high susceptibility and prior to rollout of the national vaccination programme. In contrast to 2020/21, numbers of university student cases were comparatively lower in Scotland at the start of the 2021/22 academic year30 in the absence of university closures and restriction of social activities. This period saw a decreasing case incidence nationally alongside the accumulation of immune protection via vaccination and naturally-acquired immunity, and with the introduction of self-testing by asymptomatic students using lateral flow device tests1,21,31. The opening of educational settings during periods of low incidence may be an important component of risk mitigation32, alongside promotion of safer socialisation practices and enhanced social care and support structures33.

The results from our study contrast with those reported for an outbreak among students at the University of Cambridge (UoC)15. UoC is an atypical UK higher educational institution in that in addition to education, accommodation and the social setting among students revolves around independent colleges. By contrast, UoG is far more centralised and integrated, albeit with university accommodation being more geographically intertwined with the local community. In Cambridge, the outbreak was dominated by a single large cluster that persisted until the national lockdown in November 2020. That the UoG residential halls were geographically restricted likely enabled greater control and enforcement of isolation measures through provision of student support. The outbreaks associated with UoG did however show greater interaction between students and the broader community than observed for UoC, with community and student cases in the largest clade being substantially mixed. The contrasting findings across studies provides invaluable context in understanding the role of SARS-CoV-2 spread within the university setting. Differences in university organisation and housing structure are likely to have a pronounced impact on the potential spread of respiratory viruses, and the physical location of students within a city can affect both the nature and impact of an outbreak.

Our study has some important limitations. Firstly, a comparison of infection risks associated with student social activities, or the household conditions and/or shared facilities within residential halls, was not possible. Secondly, interpretation of direct student-to-student transmission events is limited by lack of information on how sequenced cases were linked to individual households within a residential hall, and the low levels of variation in SARS-CoV-2 genomes make it difficult to conclusively define epidemiological links. Thirdly, the analysis of the origins of the university student associated clades does not consider the potential for the introduction of cases into the student population via international travel. Fourthly, it is possible that some student cases were misclassified as community cases. Fifth, we note that the presence, frequency and diversity of SARS-CoV-2 may be biased by the targeted sampling of outbreaks, particularly during periods of low sequencing coverage. Some targeted sequencing of the UoG student outbreak had occurred to assist with initial investigations in near real-time. Finally, only 30% of identified student cases were sequenced. We have aimed to provide conservative estimates throughout.

These analyses have provided valuable insight into a large outbreak in an off-campus university accommodation setting, and an evidence base to inform future policy recommendations for students returning to universities at the start of terms. The higher education setting presents a risk for contributing to the winter burden of COVID-19. With rapid identification, and implementation of non-pharmaceutical control interventions, the impact of outbreaks on local communities may be limited. However, while high rates of vaccination have moderated the impact of SARS-CoV-2 disease severity on the UK population, it is clear SARS-CoV-2 will continue to evolve in the human population such that student populations, as with all demographics, should be encouraged to be up to date with vaccinations (currently three doses in under 75 year olds as of mid-2022).

Methods

Epidemiological investigation

A retrospective search was performed in a database of NHS Scotland Test and Protect contact tracing interviews, known as the Case Management System. The case definition pertaining to a University of Glasgow (UoG) associated student was applied as follows: the text “Glasgow” and “university or uni” under occupation related fields, and with RT-PCR diagnostic test confirmation of a SARS-CoV-2 infection between 1st September 2020 and 30th November 2020. Free text notes were inspected and cases were excluded if determined to not be a University of Glasgow student. Residential postcodes were then matched to UoG student halls postcodes obtained from the Scottish Government Advanced Learning and Skills Analysis (ALSA), identifying cases resident in university accommodation. Further cross checking was performed: student status was corroborated by the individual’s age (with the majority of student cases ≤ 19 years) and/or an additional interrogation of free text notes. Four cases had missing age information. We note that our approach is relatively conservative, and the number of student cases identified may differ from that based on other methods (such as reported in Ref.20).

Student SARS-CoV-2 whole genome sequences

In Scotland, SARS-CoV-2 genome sequencing is coordinated by the NHS Sequencing Service. Respiratory specimens collected through routine RT-PCR NHS diagnostic testing and UK Government community testing were submitted to The COVID-19 Genomics UK Consortium (COG-UK) partnership laboratories at the MRC-University of Glasgow Centre for Virus Research and Sanger Institute for sequencing. The genomic sequence data were uploaded to CLIMB (Cloud Infrastructure for Microbial Bioinformatics)-COVID34 and amalgamated with all Scottish sequences. The UoG student case list was linked to Scottish genomic sequence data derived from respiratory specimens collected between 1st September 2020 and 30th November 2020 using a unique identifier. Out of a total 220 identified sequences, 214 were considered high quality for further analysis, with > 80% nucleotide coverage across the viral genome. Four sequenced cases had specimen dates preceding 19th September.

Inference of transmission clusters among UoG students

Putative transmission clusters were inferred using the A2B-COVID software package17. Given symptom onset dates and SARS-CoV-2 genome sequences from multiple individuals, this carries out a series of hypothesis tests, calculating whether the data from each pair of individuals A and B is statistically consistent with a model of A having directly infected B; data are assessed as being consistent with transmission, borderline, or unlikely to have arisen from a direct transmission event. Our model utilises distributions for the infectivity profile and time to symptom onset described for SARS-CoV-2 by other publications35,36,37. In the absence of symptom onset dates, respiratory specimen collection dates were used as proxy estimates. A sub-setting procedure was used to generate putative transmission clusters, placing two individuals in the same cluster if transmission between those two individuals in either direction was not unlikely.

Phylogenetic inference of UoG student sequences was then conducted using the IQ-Tree2 package38 in order to characterise the transmission clusters in terms of phylogenetic clades. Inferences used ModelFinder to identify the most appropriate model in each case39. The trees of Figs. 2 and 5 were thus inferred using a TIM2 model with empirical base frequencies and allowing for a proportion of invariable sites.

Inference of transmission clade origins

The sampling locations of sequences in each clade were used to determine the likely origin of each transmission cluster. Viral genome sequence data were then used to estimate the probability that each of the four major clades we identified arose from an introduction into the student population either from England or Scotland. To derive an estimate of the probability that the introduction of a clade C came from England, we first estimated the fraction of genome sequences of clade C in England and Scotland in the two weeks prior to the observation of the first student case in that clade:

$${Q}_{S,C}=\frac{{g}_{S,C}}{{G}_{S}},$$

and

$${Q}_{E,C}=\frac{{g}_{E,C}}{{G}_{E}},$$

where gS,C and gE,C were the number of virus genome sequences in clade C collected in Scotland and England respectively, during the two weeks prior to the first observation of the clade in students at the UoG, while GS and GE were the total number of virus genome sequences collected in Scotland and England during the same period.

Using these values, we estimated the prevalence of each clade in Scotland and England to inform the probabilities PS,C and PE,C that a given individual in either Scotland or England respectively was infected with SARS-CoV-2 from clade C:

$${P}_{S,C}=\frac{{{Q}_{S,C}I}_{S}}{{N}_{S}},$$

and

$${P}_{E,C}=\frac{{{Q}_{E,C}I}_{E}}{{N}_{E}},$$

where IS and IE were the mean total numbers of positive tests reported per week in Scotland and England during the 2 weeks in question, as reported in UK Government statistics40, while NS and NE are ONS mid-2020 population estimates for Scotland and England41. Given these values, we estimated the probability that a student arriving in Glasgow was infected with SARS-CoV-2 of clade C and domiciled in either Scotland (XS,C) or England (XE,C). For XS,C this is given by

$${X}_{S,C}=\frac{{P}_{S,C}{P}_{S,U}}{{P}_{S,C}{P}_{S,U}+{P}_{E,C}{P}_{E,U}},$$

where PS,U and PE,U are the probabilities that a given student at UoG was domiciled in Scotland or England respectively. The equivalent value XE,C that the student was domiciled in England is given by XE,C = 1 − XS,C.

In our calculation we neglected the possibility of introductions from outside of either England or Scotland due to the lack of data from other locations. Our measure is approximate, being prone to sampling bias and limited data, but was used to provide a broad indication of the geographical origins of clades.

Phylogenetic inference for NHSGGC sequences

The COG-UK phylogeny for SARS-CoV-2 was produced using CLIMB-COVID data available on the 2020–12-31. The grapevine pipeline42 was then used for extracting the smallest phylogenetic clade containing the sequences in each putative transmission cluster. This was used to put the SARS-CoV-2 genome sequences from UoG students in the broader context of UK wide sequences. The clades were pruned to include only sequences from UoG students and NHSGGC and students from other Scottish Universities determined through contact tracing. For visualisation of these clades, identical sequences were collapsed and the figure was produced in The Environment for Tree Exploration43.

Local epidemiological context in the City of Glasgow

All individuals with respiratory specimens collected for diagnostic testing of SARS-CoV-2 infections by RT-PCR in Scotland between 1st September 2020 and 30th November 2020 were identified from the national Electronic Communication of Surveillance in Scotland (ECOSS) database. Age-stratified 7-day rolling proportions of SARS-CoV-2 positive cases, among all RT-PCR-tested individuals, were estimated for the City of Glasgow, excluding cases with residential postcodes associated with UoG residential halls, in a conservative approach to removing the influence of UoG students from the analysis. A comparison was made with the rest of Scotland, excluding individuals resident in the City of Glasgow. The 17–24 years group represent the age range of the UoG residential halls-associated cases.

Ethics declaration

This study was undertaken as part of public health surveillance activity within the COVID-19 programme of Public Health Scotland, in line with the necessary associated regulations and guidelines. The retention and processing of information on individuals is conducted by Public Health Scotland as part of COVID-19 surveillance in Scotland in the context of emergency data processing (https://www.informationgovernance.scot.nhs.uk/covid-19-privacy-statement/), including the Civil Contingencies Act 2004, the NHS (Scotland) Act 1978 and the Public Health (Scotland) Act 2008, and under Articles 6(1)(e), 9(2)(h), 9(2)(i), 9(2)(j) of the General Data Protection Regulation. Surveillance data was shared with NHS Scotland according to the Intra NHS Scotland Data Sharing Accord (https://www.informationgovernance.scot.nhs.uk/wp-content/uploads/2020/06/2020-06-17-Intra-NHS-Scotland-Sharing-Accord-v2.0.pdf). Ethics approval was not required for this work which was based on pre-existing routine surveillance data for the Scottish population.