## Introduction

Human mobility and contact are significant drivers for the transmission of communicable diseases, such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that resulted in the COVID-19 pandemic1. While passively collected mobile phone and app-derived GPS trajectory data provide an indication of populations’ mobility and social mixing patterns2, only broad regional generalisations can be drawn. Transmission of SARS-CoV-2 occurs through close proximity between infectious and susceptible individuals due to either direct contact or respiratory aerosols in the shared space3. Therefore, insights into behaviours at a fine scale, such as within indoor environments, are also required to deepen our understanding of behaviours associated with the transmission of infections, and improve our ability to identify and prevent transmission events. This is particularly relevant for healthcare settings, where infection outbreaks present a significant risk to vulnerable patients through increased morbidity and mortality.

The concern in relation to infection transmission within hospital environments extends more widely than COVID-19, and includes other healthcare-associated infections (HAIs). The impact of HAIs on healthcare systems is considerable, resulting in staff illness, complications in patient outcomes and increasing healthcare costs. In England between 2016 and 2017, HAIs were estimated to have caused > 28,000 deaths, contributed to 21% of hospital bed days, resulted in > 79,000 days of absence in frontline HCWs and cost the NHS an estimated £2.7 billion4. The surveillance prevention and control of HAIs is a challenge as granular data are often limited and the transmission pathways are highly variable; dependent on the epidemiology of the pathogen, be it bacterium, virus or fungus5,6.

Nosocomial infections in patients are well defined, as are frameworks for their prevention and control7. To manage HAIs in patients, practitioners responsible for infection prevention and control (IPC) frequently use passive data sources that are routinely collected, such as medical records. These data sources provide information on the patient’s location within the hospital and their contacts with staff, which can be used to support surveillance, mapping patient trajectories and contact tracing8,9,10. Historically these data sources have been handled manually, using time intensive frameworks that prevent their use in real-time. Hospitals that have moved to digital systems have seen an increase in the effectiveness and efficiency of patient-focused IPC, through improved availability of data resources and reduced burdens of manual data collection and processing11,12. However, while these data streams are well established for patient-focused activities, those for the management of staff infections are relatively underdeveloped. This is surprising given that, like patients, HCWs are at risk of both acquiring and facilitating the transmission of HAIs13.

The rapid spread of SARS-CoV-2 has emphasized the need to protect front-line HCWs. Early in the pandemic the prevalence of COVID-19 infection for HCWs was high, with one London hospital reporting infection in 44% of HCWs14 and a global estimation of 11% of HCWs infected with the virus15. What’s more, the risk of infection for HCWs varied between roles and spatially, with a higher risk of infection for those working in non-emergency wards and for nurses15. Nosocomial outbreaks of SARS-CoV-2 result from a small number of highly infectious individuals, and transmission chains may include HCWs among the likely super-spreaders16. Behavioural processes, such as contact and mobility patterns, generate heterogeneity in the transmission of communicable diseases1,17 and, similar to the management of HAIs in patients, passively collected data on the within-hospital behaviours of HCWs can contribute to a more informed and rapid response to outbreaks.

HCW behaviours have been investigated using surveys18, observations19,20 and tracking technologies21,22,23,24, but these data collection methods are often prohibitively time intensive, expensive, or only provide a snapshot view that is not hospital wide. Electronic medical records (EMRs) have also been previously used to investigate HCW space use and patient contacts25,26,27, but they are either optimised for reconstructing patient trajectories or suffer from high spatial uncertainty. Additional databases, such as door access logs could complement EMRs by supplementing spatiotemporal information on HCW mobility. These data sources are analogous in nature to the passively-collected spatial data from mobile phone records, which were used during the COVID-19 pandemic to demonstrate the effectiveness of movement restrictions in reducing contact rates, and subsequently lowering levels of community transmission28. Using the routinely collected hospital data as an indicator for HCW behaviour provides opportunities to enhance evidence-based IPC in a similar way; supporting contact tracing efforts, validating transmission pathways and helping to monitor the effectiveness of interventions in the hospital.

As with other communicable diseases, IPC interventions to prevent nosocomial outbreaks of COVID-19 include hand washing, the use of personal protective equipment (PPE), limiting the traffic of people in the hospital, and cohorting staff and patients29. The routinely collected data cannot identify or monitor all HCW behaviours that are epidemiologically relevant, but can indicate their level of space use within the hospital, their frequency of movement, the number of patients they contact and the frequency of patient contact. As behavioural markers these metrics provide quantitative measures for IPC interventions aimed specifically at reducing the spatial connectivity of spaces (e.g. by restricting staff access/flow to areas) and social connectivity of individuals in the hospital (e.g., through patient and staff cohorting). The data can therefore be used to assess the extent to which interventions targeted towards HCW mobility and patient contacts have been successful in achieving their aim, or in determining opportunities for improvement.

This paper outlines a framework for the use of routinely-collected hospital data in the measurement of HCW behaviour at an aggregate level. We describe (1) the integration of diverse digital data sources for the quantification of HCW mobility and patient contact within the hospital setting, and (2) demonstrate the use of these data sources in supporting IPC activities through a series of analyses. Specifically, we use data from a London Hospital to investigate whether or not (i) HCW mobility, (ii) HCW patient contacts, (iii) spatial connectivity (flow between floors) and (iv) indirect contacts between patients (through shared HCW contacts) were reactive to the early stages of the COVID-19 pandemic. We find that fluctuations in HCW mobility and patient contacts were most prominent on floors handling the majority of COVID-19 patients, while the flow of HCWs between COVID-19 and non COVID-19 wards continued throughout the pandemic, and daily rates of indirect contact between patients provided evidence for reactive staff cohorting.

## Methodology

### Study site and context

University College London Hospital NHS Trust (UCLH) is a large acute and tertiary referral academic hospital located in central London. The Main UCLH building is comprised of a central Tower that has 19 floors (floors −2 to 16) and is linked to two other buildings; the Podium and the Elizabeth Garett Anderson (EGA) Wing. In this analysis, we only considered data for the Tower building at UCLH. Here we describe floors within the Tower by the ward/department that predominantly occupies it; the basement (floor-2), imaging (floor -1), emergency department (ED on floor 0), acute medicine unit (AMU on floor 1), day surgery (floor 2), critical care (floor 3), plant (floor 4), nuclear medicine (floor 5), short stay surgery (floors 6), hyper-acute stroke unit (HASU on floor 7), respiratory & infectious diseases (floor 8), general surgery (floor 9), care of the elderly (CoE on floor 10), paediatrics (floor 11), adolescents (floor 12), oncology (floor 13), head and neck (floor 14), private wards (floor 15) and haematology (floor 16).

During the pandemic the UCLH Tower became a key site for COVID-19 care in London, and the peaks in the number of COVID-19 patients in the Tower (Fig. 1c) closely followed that reported for all London hospitals (as downloaded from gov.uk; r = 0.97). To investigate changes in the daily number of events in different stages of the pandemic, we manually identified distinct time periods based on the number of COVID-19 patients in the Tower. The first stage of the pandemic was March 1st– June 30th 2020, when the ‘first wave’ of COVID-19 hospital admissions was experienced, during which the WHO declared a pandemic (March 11th 2020) and the first national lockdown in England was announced (23rd March 2020). The second stage was between July 1st and August 31st 2020, which represented the ‘summer lull’, where the number of COVID-19 patients in London hospitals remained at a low level and community interventions were eased. The third stage was between November 1st 2020 and March 31st 2021, when the ‘second wave’ of COVID-19 hospital admissions occurred, the second national lockdown was announced (5th November 2020) and the mass-vaccination programme began (December 8th 2020).

We also identified a pre-pandemic or ‘baseline’ period between January 1st and February 28th 2020. This time period was prior to the substantial rise in COVID-19 admissions in the hospital, and was considered to be when ‘normal’ working patterns (HCW movement and patient contacts) would be observed. We use this baseline period to compare the behaviour of HCWs during the pandemic to that pre-pandemic. Using data from corresponding weeks of the year may provide a more appropriate baseline when conducting such an analysis outside of the pandemic; as seasonal patterns in hospital admissions that likely influence HCW behaviour are known to occur30. However, the pandemic was associated with community interventions that disrupted seasonal patterns in hospital admissions concerning non-COVID-19 diseases31,32. Therefore, our baseline period provides an appropriate representation of ‘normal’ working patterns given the non-COVID-19 patient population was not comparable to that pre-pandemic, and so typical seasonal variations in staff behaviour were not expected.

### Data sources

The data sources used in this study were selected on the basis of providing spatial and temporal indicators of staff movement and patient contacts within the hospital. For clarity, Table 1 provides definitions of terms relating to behavioural processes that are investigated in this study. The study protocol was approved by the NHS Health Research Authority (South Central – Berkshire REC ref: 20/SC/0147; protocol number: 130861) and ethical oversight was provided by the UCLH research ethics committee (IRAS project ID: ref. 281836; UCL GOS ICH R&D approval number: 20PL06). Data were extracted and deidentified by data managers at UCLH, and informed consent was waivered.

Patient contact events in EMRs were extracted from Epic, a privately owned hospital system used by UCLH for managing medical records. While Epic contains a large volume of data on patient diagnosis and treatment, we only use specific fields that provide information on the spatial and temporal attributes of within-hospital contacts between staff and patients. Data fields included the datetime of events, a description of the location (bed ID and floor), an indicator of the event type, anonymous identifiers for the patient, pseudonymous identifiers for the HCW and the COVID-19 status of the patient at the time of the event (0/1).

Door events were extracted from the database for security door access logs, known in this context as CCure. The security doors were located at entry/exit points of wards and access cards were also used to operate staff lifts. Data fields included the datetime of events, a description of the location (door ID and floor), direction of passage (in/out), status (accepted/rejected) and a pseudonymous staff identifier.

### Data cleaning

All data processing was conducted in R33. Data for events outside the UCLH Tower were discarded. Data for the month of October 2020, a weekend in July and another in November were also discarded, as records either could not be extracted or had an unusually low number of events (indicating an issue with extraction). Contact events in EMRs that did not require face to face contacts (e.g., Telephone or Letter) were excluded. Door events with a rejected status were removed along with duplicate events in the same direction that were within 60 seconds of each other. Two types of lift (or elevator) events were present in the door access logs; Lift Calls where a card is used to request a lift, and Lift commands where a card is used before selecting which floor to go to. All Lift Call events were removed as they overinflate the number of movement events for individuals using lifts (because some individuals may make multiple and repeated lift calls while waiting for a lift).

### Aggregate measures

Staffing levels, $$\left|H\right|$$, were determined by summing the number of distinct HCWs, h, in the set of HCW IDs identifiable in the routinely collected data, H. Staffing levels were calculated for each day, t, stage of the pandemic, Stage, and for each floor, f, and the entire building. For each stage of the pandemic the mean daily staffing levels, $$\bar{H}$$, were calculated for the entire building and each floor. To account for the weekly pattern in staffing levels (see Fig. 1 of the results), we calculated the means separately for weekdays and weekends.

Door events, m, were used as an indicator of HCW behaviour in terms of their mobility, where M is the full set of door events. The number of door events, $$\left|M\right|$$, was used as an absolute measure of HCW mobility and was calculated for the entire building and each floor on each day. There was a strong correlation between daily HCW mobility and daily staffing levels (r = 0.94) and therefore, to control for changes in staffing levels, the rate of mobility, Mr, was calculated as a function of staffing levels, where:

$${{Mr}}_{t,f}=\frac{|{M}_{t,f}|}{|{H}_{t,f}|}$$
(1)

To compare mobility levels between stages of the pandemic, the mean daily mobility, $$\bar{M}$$, and the mean daily rate of mobility, $$\overline{{Mr}}$$, were calculated for the entire building and for each floor during the different stages of the pandemic. Again, due to the weekly temporal pattern, these means were calculated separately for weekdays and weekends.

Patient levels, $$\left|P\right|$$, were calculated by summing the number of distinct patients, p, in the set of patient IDs identifiable in the data, P, for each day, and the mean daily patient levels, $$\bar{P}$$, calculated for the entire building, and for each floor, during each stage of the pandemic. The same metrics were also calculated using the subset of patients known to be positive for COVID-19, $${P}^{{Positive}}$$.

Patient contact events, c, were used as an indicator of HCW behaviour in terms of patient engagement where C is the full set of patient contacts. The number of patient contacts, $$\left|C\right|$$, was used as an absolute measure and was calculated for the entire building and each floor on each day. There was a correlation between daily patient contacts and daily patient levels (r = 0.65), and therefore we also calculated the daily rate of patient contacts, Cr, as a function of patient levels, where:

$${{Cr}}_{t,f}=\frac{|{C}_{t,f}|}{|{P}_{t,f}|}$$
(2)

To compare levels of patient engagement between stages of the pandemic, the mean daily number of patient contacts, $$\bar{C}$$, and the mean daily rate of patient contacts, $$\bar{C}r$$, were calculated for the entire building and for each floor during the different stages of the pandemic.

To investigate the weekly and hourly patterns of mobility and patient engagement, a count for the number of door events and patient contacts was made for each hour, hr, of each day of the week, w, and separately for the different stages of the pandemic. These counts were then weighted by dividing them by the number of days each day of the week appeared in the dataset.

### Changes in time and space

To investigate how the measures of daily HCW behaviour, staffing levels and patient levels differed from that pre-pandemic (baseline), on each floor and within the entire building, the normalised difference to baseline, N, was calculated for each day e.g. normalised difference to baseline for HCW mobility across the entire building, $${N}_{t}^{M}$$, where:

$${N}_{t}^{M}=\frac{(\left|{M}_{t}\right|\,-\,{\bar{M}}_{{Baseline}})}{{\bar{M}}_{{Baseline}}}$$
(3)

The normalised difference to baseline was also calculated for the averaged values for HCW behaviour during each stage of the pandemic. N can be interpreted as proportional change, but is presented as percentage change in the results. For metrics with a strong weekly pattern (H, Mr and M), N for weekdays was calculated using the weekday average, and the weekend average was used for weekends.

### Spatial connectivity

A dyadic analysis was conducted for each time period to assess the relationship between floors in terms of the flow of HCWs between them. For each spatial dyad (e.g. floors 1 & 2, floors 1 & 3 etc.) and using both door events and patient contacts, the number of HCWs that were active on both floors in any single day was extracted where:

$${{dyad}}_{t,f,i}=|{H}_{t,f}\cap {H}_{t,i}|$$
(4)

The index of the second floor in the dyad is denoted i. The resulting matrix was then treated as a weighted network with the diagonal set to zero. Lift events were excluded from this analysis as it was not possible to identify the floor on which they occurred. The Louvain clustering algorithm was used to identify floors that had stronger links. Louvain clustering uses a deterministic algorithm with a hierarchical greedy modularity maximization-based approach34. To check the robustness of clusters we also identified clusters using the leading eigenvalue and walktrap algorithms. All clustering analyses were implemented using the R package igraph v1.2.735.

### Patient connectivity

Patient connectivity, S, was determined by identifying the number of COVID-19 negative patients each patient was indirectly in contact with through shared contacts with the same HCWs on the same day. This was achieved by first identifying the set of HCWs that had contact with the jth patient on each day, where:

$${H}_{j,t}=\{h\,:\,{C}_{t}({p}_{j},h)=1\}$$
(5)

Next the set of patients not known to be positive for COVID-19, $${P}^{{Negative}}$$, and that had also been in contact with any of the HCWs in $${H}_{j,t}$$ were identified, where:

$${P}_{j,t}^{{Negative}}=\,\big\{p\,:p\,\in {P}_{t}^{{Negative}},\,p\ne \,{p}_{j},\,\exists h\,\in \,{H}_{j,t}{{{{{\rm{|}}}}}}{C}_{t}\left(p,h\right)=1\big\}$$
(6)

S was then calculated for each patient as a proportion of all patients not known to be positive for COVID-19, and expressed as a percentage in the results where:

$${S}_{j,t}=\frac{|{P}_{j,t}^{{Negative}}|}{|{P}_{t}^{{Negative}}|}$$
(7)

For each day and stage of the pandemic, we made separate calculations for the average patient connectivity of patients not known to be positive for COVID-19, $${\bar{S}}^{{Negative}}$$, and COVID-19 positive patients, $${\bar{S}}^{{Positive}}$$.

### Statistics and reproducibility

Linear models were used to statistically determine if daily metrics for staffing levels, patient levels, HCW mobility and patient contacts during each stage of the pandemic were different to baseline. Separate models were run for each metric, with the daily values as the response variable and the stage of the pandemic as the only fixed effect.

To investigate whether or not the N for daily rates of HCW mobility and patient contacts were different on COVID-19 floors compared to non-COVID-19 floors, a mixed effects linear model was run with N as the response variable. An interaction term was included between the stage of the pandemic and a binomial flag for whether the floor handled the majority of COVID-19 patients. Floor ID was included as a random effect. Data during the baseline were not included in the model.

Linear models were used to identify differences in $${N}_{t,f}^{{Mr}}$$ and $${N}_{t,f}^{{Cr}}$$ compared to those at baseline, and these had an interaction term between the stage of the pandemic and floor ID. To investigate the relationship between the daily number of COVID-19 patients in hospital and N (for daily rates of HCW mobility and patient contacts), linear models were run with N as the response variable and an interaction term between the stage of the pandemic, floor ID and the logged (base 2) daily number of COVID-19 patients. For COVID-19 floors, the number of COVID-19 patients was taken as the number of patients on the floor, while for non-COVID-19 floors the number of COVID-19 patients in the entire hospital was used.

To assess the relationship between the daily connectivity of patients and the number of COVID-19 patients in the hospital during each stage of the pandemic, we used a linear model with an interaction term between the logged (base 2) number of COVID-19 patients in the hospital and the stage of the pandemic.

The emmeans package (v1.3.3) was used to extract specific post-hoc comparisons of interest (e.g. baseline vs first wave and baseline vs summer lull), and the package lme4 (v1.1-25) was used to run mixed effects models. The DHARMa package (v0.4.1) was used for model diagnostics.

## Results

Data were analysed for 8042 HCWs that had logged door events and/or patient contacts in the UCLH Tower building between January 2020 and March 2021. This included both medical staff and non medical staff (cleaners, porters and admin). During the entire observation period 5,510,359 door events were recorded. In total, 21,801 patients were detected in the routinely collected data, of which 1707 (8%) were positive for COVID-19. Of the 6,931,878 patient contacts recorded, 1,643,113 (24%) were with COVID-19 patients. Table 2 provides a summary for the different stages of the pandemic.

In the following sections we describe the temporal and spatial patterns in the behaviour of HCWs, and how these changed throughout the pandemic. We also describe epidemiologically relevant changes in the patterns of spatial connectivity and indirect contacts between patients.

### Temporal dynamics

During the baseline period, the daily number of door events showed clear temporal regularity, whereby the number of events was highest during weekdays (Fig. 1a). These peaks were in line with the daily staffing levels (Fig. 1b). The hourly number of door events were highest on weekdays between 7 am and 5 pm, but this peak was less prominent at weekends (Fig. 2a). The daily number of patient contacts did not exhibit an obvious weekly pattern, but the daily number of patients was highest during weekdays. Regardless of the day, the hourly number of patient contacts peaked once at 10 am and again at 6 pm (Fig. 2e). These temporal patterns demonstrate the utility of the routinely collected data in depicting staff and patient levels, in addition to the global activity of HCWs, which will underline the nature of staff working patterns within the hospital. Below we describe how staffing levels, patient numbers, HCW mobility (Fig. 2b–d) and patient contacts (Fig. 2f–h) changed during the pandemic compared to pre-pandemic levels.

The first wave of COVID-19 patients was associated with a drop in weekday staffing levels (model estimate ($$\hat{\beta }$$) = −205; 95% confidence intervals (CI) = −273, −136; t113 = −7.965; p < 0.001; $${N}_{{Firstwave}}^{H,{Weekday}}$$ = −10%; Fig. 1b; Table 3), while no difference was observed during weekends ($$\hat{\beta }$$ = −9; CI = −55, 38; t301 = −0.495; p = 1.000). The daily number of door events was higher during weekends ($$\hat{\beta }$$ = 517; CI = 6, 1029; t113 = 2.074; p = 0.046; $${N}_{{Firstwave}}^{M,{Weekend}}$$ = 6%), but remained consistent during weekdays ($$\hat{\beta }$$ = −354; CI = −903, 198; t301 = −1.701; p = 0.054) which is surprising given the reduced staff levels. This is partly explained by an increase in the per HCW rate of daily door events during weekdays ($$\hat{\beta }$$ = 1.1; CI = 0.7, 1.4; t301 = 8.072; p < 0.001; $${N}_{{Firstwave}}^{{Mr},{Weekday}}$$ = 11%) and weekends ($$\hat{\beta }$$ = 0.8; CI = 0.3, 1.4; t113 = 3.936; p = 0.001; $${N}_{{Firstwave}}^{{Mr},{Weekend}}$$ = 7%), suggesting that HCWs working during the first wave had higher levels of mobility than those pre-pandemic. The daily number of patients in the hospital reduced ($$\hat{\beta }$$ = −265; CI = −293, −236; t418 = −25.849; p < 0.001; $${N}_{{Firstwave}}^{P}$$ = −44%), and this coincided with a decrease in the daily number of patient contacts logged by HCWs ($$\hat{\beta }$$ = −4296; CI = −4991, −3601; t418 = −16.387; p < 0.001; $${N}_{{Firstwave}}^{C}$$ = −24%), which was associated with a less prominent pattern in the hourly counts of contacts (Fig. 2f). However, the per patient rate of daily contact events was higher than at baseline ($$\hat{\beta }$$ = 12.8; CI = 10.5, 15.0; t418 = 15.106; p < 0.001; $${N}_{{Firstwave}}^{{Cr}}$$ = 42%), suggesting HCWs had more contact events per patient than that logged pre-pandemic.

In the summer lull, when there were fewer COVID-19 patients in the hospital, daily patient numbers remained lower than baseline levels ($$\hat{\beta }$$ = −191; CI = −221, −161; t418 = −16.967; p < 0.001; $${N}_{{Summerlull}}^{P}$$ = −32%), as did staffing levels during weekdays ($$\hat{\beta }$$ = −234; CI = −306, −162; t301 = −8.671; p < 0.001; $${N}_{{Summerlull}}^{H,{Weekday}}$$ = −14%) and weekends ($$\hat{\beta }$$ = −54; CI = −104, −4; t113 = −2.916; p = 0.026; $${N}_{{Summerlull}}^{H,{Weekend}}$$ = −7%). While the daily and hourly pattern of patient contacts began to return towards that seen pre-pandemic, the daily count of events remained lower than during the baseline ($$\hat{\beta }$$ = −2512; CI = −3246, −1777; t418 = 9.067; p < 0.001; $${N}_{{Summerlull}}^{C}$$ = −14%), and the daily rate of contact was maintained above baseline levels ($$\hat{\beta }$$ = 7.8; CI = 5.4, 10.2; t418 = 8.727; p < 0.001; $${N}_{{Summerlull}}^{{Cr}}$$ = 26%). During the weekdays, the daily number of door events were lower than baseline levels ($$\hat{\beta }$$ = −1471; CI = −2049, −893; t301 = −6.760; p < 0.001; $${N}_{{Summerlull}}^{M,{Weekday}}$$ = −10%) and the rate of mobility was higher on average ($$\hat{\beta }$$ = 0.4; CI = 0.1, 0.8; t301 = 3.003; p = 0.017; $${N}_{{Summerlull}}^{{Mr},{Weekday}}$$ = 4%), while no difference in either was observed during weekends (M: $$\hat{\beta }$$ = −424; CI = −974, 125; t113 = −2.074; p = 0.242; Mr: $$\hat{\beta }$$ = 0.3; CI = −0.3, 0.9; t113 = 1.134; p = 1.000).

During the second wave, the daily number of patients in the hospital remained lower than that at baseline ($$\hat{\beta }$$ = −159; CI = −186, −131; t418 = 8.671; p < 0.001; $${N}_{{Secondwave}}^{P}$$ = −27%), but an increase was observed in the daily number ($$\hat{\beta }$$ = 1225; CI = 549, 1901; t418 = −4.803; p < 0.001; $${N}_{{Secondwave}}^{C}$$ = 7%) and rate of patient contacts ($$\hat{\beta }$$ = 13.7; CI = 11.5, 15.9; t418 = 16.676; p < 0.001; $${N}_{{Secondwave}}^{{Cr}}$$ = 46%). Weekday staff numbers remained lower than baseline levels ($$\hat{\beta }$$ = −154; CI = −220, −87; t301 = −6.162; p < 0.001; $${N}_{{Secondwave}}^{H,{Weekday}}$$ = −9%), but had increased during weekends ($$\hat{\beta }$$ = 88; CI = 43, 133; t113 = 5.195; p < 0.001; $${N}_{{Secondwave}}^{H,{Weekend}}$$ = 11%). It is worth noting how daily staffing levels, after an initial drop during the Christmas break, followed the rise and fall of COVID-19 patients, emphasising a different strategy by the hospital to that in the first wave; where staff numbers were reduced. The daily number of door events only increased during weekends ($$\hat{\beta }$$ = 1383; CI = 883, 1883; t113 = 7.425; p < 0.001; $${N}_{{Secondwave}}^{M,{Weekend}}$$ = 16%), while the rate of mobility increased only on weekdays ($$\hat{\beta }$$ = 1.1; CI = 0.8, 1.5; t301 = 8.939; p < 0.001; $${N}_{{Secondwave}}^{{Mr},{Weekday}}$$ = 12%).

### Spatial-temporal dynamics

Overall, during the first and second waves, changes in the daily rate of HCW mobility ($${N}_{t}^{{Mr}}$$) and patient contacts ($${N}_{t}^{{Cr}}$$) were statistically higher on COVID-19 wards than non-COVID-19 wards, with no significant difference during the summer lull (Supplementary Table S1). However, considerable spatial variation was observed in the response from HCW mobility and patient contacts to the different stages of the pandemic. Here we focus on the behaviour of HCWs on the six floors that handled the majority (> = 15%) of COVID-19 patients; AMU (floor 1), critical care (floor 3), HASU (floor 7), respiratory diseases (floor 8), general surgery (floor 9) and CoE (floor 10). We report the results for non-COVID-19 floors in Supplementary Table S2 and Table S3. It is worth noting that the emergency department (ground floor) experienced a large number of COVID-19 patients (Supplementary Fig. S1) but was not considered a COVID-19 floor; patients were triaged and then moved to a relevant ward.

With the exception of AMU, $${N}_{t}^{{Mr}}$$ during the first wave was higher on all COVID-19 floors compared to baseline levels (Table 4; Fig. 3a–f). Despite this, on floors with HASU, general surgery and CoE, there was a negative relationship between $${N}_{t}^{{Mr}}$$ and the number of COVID-19 patients on these floors (Table 5). In contrast, $${N}_{t}^{{Mr}}$$ increased with the number of COVID-19 patients on AMU, critical care and the respiratory ward. During the summer lull, $${N}_{t}^{{Mr}}$$ was no different to baseline levels on the floor with general surgery, but was lower on AMU and higher on all other COVID-19 floors; the most notable increase was on HASU ($${N}_{{Floor}7}^{{Mr}}$$ = 101%). During the summer lull, a positive relationship was observed between $${N}_{t}^{{Mr}}$$ and the number of COVID-19 patients on floors with HASU and CoE. In response to the second wave of COVID-19 patients, $${N}_{t}^{{Mr}}$$ increased above baseline levels on all COVID-19 floors, with the exception of AMU and critical care. $${N}_{t}^{{Mr}}$$ had a positive association with the number of COVID-19 patients on all COVID-19 floors, with the exception of HASU where a negative relationship was observed.

During the first wave and compared to pre-pandemic levels, $${N}_{t}^{{Cr}}$$ increased on AMU, critical care and HASU. There was a positive relationship between $${N}_{t}^{{Cr}}$$ and the number of COVID-19 patients on floors with HASU, general surgery and CoE. During the summer lull, there was only an increase in $${N}_{t}^{{Cr}}$$ on floors with AMU and CoE. $${N}_{t}^{{Cr}}$$ was not correlated with the number of COVID-19 patients on any COVID-19 floor during the summer lull. With the exception of critical care, $${N}_{t}^{{Cr}}$$ was higher on all COVID-19 floors during the second wave compared to pre-pandemic levels; with the most notable increase on HASU ($${N}_{{Floor}7}^{{Cr}}$$ = 155%). A positive association was observed between $${N}_{t}^{{Cr}}$$ and the number of COVID-19 patients on floors with AMU and HASU.

### Spatial connectivity

The connectivity between floors (based on the number of HCWs that had activity on any two floors in the same day) revealed that some were more connected than others, and that the resulting clustering of floors varied throughout the pandemic (Fig. 4). Below we describe spatial clusters determined by the Louvain algorithm. Clusters identified using the leading eigenvector and random walk algorithms were similar and are reported in Supplementary Fig. S2.

During the baseline period, three clusters were identified; one large cluster (B1) containing Imaging through to Plant (floors −1 to 4), Short stay surgery (floor 6), HASU (floor 7) and General surgery (floor 9); a smaller cluster (B2) comprising the Basement (floor −2), Nuclear medicine (floor 5), Paediatrics (floor 11) and Adolescents (floor 12); and a third (B3) consisting of Respiratory disease (floor 8), CoE (floor 10) and Oncology through to Haematology (floors 13–16).

During the first wave, the connectivity between floors changed such that four clusters were identifiable, and floors adjacent to each other were generally in the same cluster. The basement through to Nuclear medicine (floors −2 to 5) formed one cluster (FW1), which included two COVID-19 floors (floors 1 & 3). Short stay surgery through to CoE (floors 6 to 10) formed a second cluster (FW2), all of which had COVID-19 patients, but only floor 6 was considered a non-COVID-19 floor. Paediatrics and adolescents (floors 11 and 12) made up a third cluster (FW3) and the fourth cluster (FW4) consisted of Oncology through to Haematology (floors 13 to 16); neither cluster contained floors that handled the majority of COVID-19 patients.

During the summer lull only three clusters were identified. Imaging through to CoE (floors −1 to 10) formed the largest cluster (SL1), and this included all floors identified as COVID-19 floors. The basement, Paediatrics and Adolescents (floors −2, 11 and 12) were in a second cluster (SL2), and the third cluster (SL3) consisted of Oncology through to Haematology (floors 13 to 16). During the second wave the connectivity of floors and the clusters they formed were unchanged from that in the summer lull, suggesting that the spatial activity of HCWs had stabilised.

### Indirect contacts between patients

The average daily connectivity between COVID-19 negative patients (due to shared contacts with HCWs on the same day; $${S}_{t}^{{Negative}}$$; Fig. 5) remained stable, on average, throughout the pandemic ($${\bar{S}}_{{Firstwave}}^{{Negative}}$$ = 5%; $${\bar{S}}_{{Summerlull}}^{{Negative}}$$ = 5%; $${\bar{S}}_{{Secondwave}}^{{Negative}}$$ = 5%). However, $${S}_{t}^{{Negative}}$$ increased by 0.30% (CI = 0.09%, 0.51%; t355 = 2.786; p = 0.006) for every two fold increase in the number of COVID-19 patients in hospital during the first wave. No significant effect was found during the summer lull ($$\hat{\beta }$$ = −0.21%; CI = −0.56%, 0.15%; t355 = −1.154; p = 0.249) or during the second wave ($$\hat{\beta }$$ = 0.20%; CI = 0.00%, 0.41%; t355 = 1.922; p = 0.055).

In contrast, the hospital reduced the daily connectivity between COVID-19 negative and positive patients ($${S}_{t}^{{Positive}}$$) during the first and second waves to a low of < 1% ($${\bar{S}}_{{Firstwave}}^{{Positive}}$$ = 2%; $${\bar{S}}_{{Secondwave}}^{{Positive}}$$ = 2%). However, this was not maintained during the summer lull ($${\bar{S}}_{{Summerlull}}^{{Positive}}$$ = 4%) suggesting a relaxation in staff cohorting procedures. These patterns highlight a reactive response to the rise of COVID-19 patients and is further supported whereby, for every doubling in the number of COVID-19 patients in the hospital, $${S}_{t}^{{Positive}}$$ decreased by 0.81% (CI = 0.61%, 1.00%; t355 = −8.188; p < 0.001) during the first wave and by 0.56% (CI = 0.37%, 0.75%; t355 = −5.767; p < 0.001) during the second wave; no significant relationship was found during the summer lull ($$\hat{\beta }$$ = 0.25%; CI = −0.07%, 0.58%; t355 = 1.529; p = 0.127).

During the first wave a noteworthy spike occurred in $${S}_{t}^{{Positive}}$$ to 16% on the 5th May 2020. This was due to one HCW who had contact with 87 patients, 32 of which were positive for COVID-19. Applied in real time, such insights could help practitioners quickly identify and address weaknesses in IPC activities that could compromise patient and staff safety.

## Discussion

Mobility and contact rates are fundamental to the transmission of communicable diseases, and data on these behaviours are extremely valuable for epidemiological investigations. It has long been established that HCWs can be part of transmission clusters within healthcare settings13,16 however, data on the behaviour of HCWs are often scarce. In this paper, we demonstrate how behavioural markers for HCW mobility and patient contacts within the hospital, can be derived from EMRs and door access logs at an aggregate level. Using data from a London teaching hospital and during the COVID-19 pandemic, we provide a framework to further support IPC practitioners in assessing patterns of staff behaviour, identifying behavioural change and in conducting evidence-based infection control.

The temporal trends in workforce and HCW behaviour are in line with those reported in other studies24,30,36,37,38. Staff and patient levels determined daily patterns in the aggregate measures of HCW behaviour, and this was evident when the hospital reduced staff and patient numbers during the first wave of COVID-19 patients, which resulted in a notable drop in logged patient contacts. However, the rate of patient contact (number of contact events per patient) was maintained above baseline levels throughout the pandemic, as was the rate of HCW mobility (number of door events per HCW) during weekdays. Our framework illustrates the utility of the featured data sources in representing the working practices of HCWs, and their potential to passively monitor behaviour change and activity patterns of the HCW population. From an operational point of view this framework provides a means to quickly generate evidence of changing working practices and identify undesirable work pressures, and risk of workforce fatigue, and resulting illness and staff shortages.

Patterns of HCW behaviour showed considerable spatial-temporal variation in response to the pandemic. Increases in the rate of mobility and rate of patient contact were most notable on floors handling the majority of COVID-19 patients during the first and second waves, and we find evidence to suggest that on some floors the observed changes in behaviour were associated with shifts in the COVID-19 patient population. Exact causes for the observed changes in HCW behaviour are hard to ascertain, and are likely products of a combination of factors from shifting working practices (e.g. through IPC activities), perceptions of risk (e.g. before/after vaccination and changes in the availability of PPE) and hospital pressures (e.g. needs of the patient population). Furthermore, the degree of change in these behavioural markers was not equal across floors and, despite few (or no) COVID-19 patients, non-COVID-19 floors also experienced changes in staff behaviour. Differences in the trends of HCW behaviour on different floors will depend on the functions of the wards occupying them, how these functions evolved during the course of the pandemic, and on IPC interventions. Even where data on causative factors for observed behaviour change are not available, the framework provides insights to generate hypotheses and a means for further investigation.

One strategy to prevent nosocomial transmission is to cohort patients and staff, whereby patients positive for the disease of concern and/or the staff responsible for their care, are kept separate to the rest of the patient population29. At UCLH this was achieved by establishing COVID-19 wards that handled the majority of COVID-19 patients. Using the routinely-collected data we were able to identify the main COVID-19 wards and monitor the daily indirect contacts between patients (as determined through shared contacts with HCWs on the same day). Successful staff cohorting would have resulted in no indirect contact between COVID-19 negative and positive patients. However, this was not consistently achieved and, while the indirect contacts between these groups of patients were substantially reduced during the first and second waves, the response was not maintained during the summer lull, and appears reactive to increases in the number of COVID-19 patients. Staff cohorting can be prevented by numerous practical limitations, and the pandemic presented many challenges including staff shortages. Using EMRs to investigate indirect contacts between patients has been explored before26, but this is not necessarily routine practice, and we provide such an analysis here to illustrate the diversity in applications for data on staff behaviour. It is evident that the routinely collected data provides a tool for IPC practitioners to monitor (in near to real time) the success of interventions such as cohorting, and offers a means to quickly identify, investigate and react to undesirable spikes in indirect patient contacts facilitated by HCWs that could compromise patient and staff safety.

Another strategy available to IPC practitioners is to limit the traffic within the hospital29. At UCLH a number of interventions were adopted to accomplish this, including reduced patient and staff numbers, and through the installation of COVID doors that created barriers to disrupt the flow of people between spaces. Our analyses identified the reduced staff and patient numbers, along with substantial changes in the connectivity of floors during the first wave, which then stabilised during the summer lull and second wave; indicating a new normal to working patterns. We were also particularly interested in the spatial connectivity of floors that handled the majority of COVID-19 patients with non-COVID-19 floors, as this could present opportunities for outbreaks. The flow of staff between floors throughout the pandemic was such that COVID-19 floors were closely associated with non-COVID-19 floors, therefore the flow of HCWs between areas with low and high burdens of the disease may have presented a risk of outbreaks if sufficient IPC measures were not in place. Had a framework like the one presented here been operationalised, decision-makers would have had a means to rapidly assess the spatial connectivity of spaces and use these data to justify further interventions/investigations to mitigate the associated risks. While our analysis on spatial connectivity provides an example of how data on HCW movements can support IPC, caution should be taken in the interpretation of results, as we were unable to assess the effect of COVID doors on the mobility of staff; due to missing information on the dates of their installation and use. A higher resolution analysis that takes into account the partitions within floors may reveal the true flow of staff between COVID-19 and non-COVID-19 areas.

In this investigation we used a minimal number of data fields and metrics aggregated at the level of the HCW population. However, further insights into the variations of HCW behaviour could be uncovered if the data were paired with other data fields and aggregated by individual or HCW group. For example, combining this data with specific details on IPC activity, would allow investigations into the pre and post effects of interventions on HCW mobility and patient contacts. Combining these data sources with data from staff screening programmes could help identify HCW groups and individuals more at risk of acquiring HAIs, along with the behaviours or working conditions associated with higher risk. If shown to be epidemiologically relevant, the markers of HCW behaviour may also provide a dynamic tool to identify staff more at risk of involvement in chains of transmission, which could inform the targeted screening of staff. The data analysed here were anonymized and aggregated metrics used such that an individuals privacy was protected. That said, studies on HCW perceptions and acceptance on the use of passively collected data for routine surveillance are required to help address ethical concerns, and data should only be used for means of better protecting both staff and patients e.g. for informing IPC.

The framework outlined here provides a system wide perspective on staff behaviour that can enable exploration of specific and spatially discrete contexts. The measures facilitate comparisons between different occupational contexts, to generate and test hypotheses on behaviour change, and contribute to a better understanding of the spatial and temporal heterogeneity in the infection risk for HCWs; as was seen during the pandemic15. For this to be realised in routine practice, platforms such as data dashboards are required to enable exploitation of measures. We envision the resulting tools would be of use to practitioners to facilitate rapid investigations, provide early warning systems and support decisions on policies and interventions, in addition to monitoring the effectiveness of such actions. Providing the digital infrastructure is in place, the framework can be adapted for use outside of the healthcare environment, such as contexts involving contact with wildlife or livestock where there is a risk of disease emergence. The development of digital systems for real-time behavioural monitoring related to disease transmission will contribute towards improved pandemic preparedness.

While the data sources featured here have potential to be used operationally by IPC practitioners in real time, there are several challenges that hospitals may have to overcome for this to be realised. Firstly, it is worth noting that the framework presented here relies on electronic records and, while UCLH is a digital hospital, many healthcare facilities in the UK and across the world are not. Hospitals, particularly those within the NHS, often outsource services such as systems for security door logs and EMRs, and in this study the various datasets from outsourced companies had to be consolidated, which required the creation of a master staff index to establish links between the databases. Mapping the data flows and creating a user friendly platform (such as a data dashboard) will be challenging, requiring the collaboration of researchers, IT professionals and IPC staff. There are also challenges in relation to the validation of these data, as we lack assurances on the exact nature of the processes underlying their generation. For instance, the use of staff cards to open security doors may be biased in time and space by HCWs following each other through doors (e.g. during ward rounds), or by doors being left open. Likewise, there has been little systematic analysis to date in relation to the accuracy of the spatial or temporal markers from EMRs, or the HCWs involved in events. While these remain important validation challenges, the principles underlying aggregate patterns produced using these data appear sound.

In conclusion, this paper has described a framework to produce simple markers for the behaviour of staff in the healthcare environment from routinely collected data. Data on HCW behaviours are often scarce but, as hospitals embrace the digital age, data is becoming more readily available. Our framework provides a means to rapidly assess working patterns, investigate behaviour change and support evidence-based IPC activities in near to real-time. The integration of such frameworks into routine practice will be pivotal in building more resilient healthcare systems to better protect HCWs and patients, and to improve pandemic preparedness.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.