Analytical Modelling of the Spread of Disease in Confined and Crowded Spaces

Since 1927 and until recently, most models describing the spread of disease have been of compartmental type, based on the assumption that populations are homogeneous and well-mixed. Recent models have utilised agent-based models and complex networks to explicitly study heterogeneous interaction patterns, but this leads to an increasing computational complexity. Compartmental models are appealing because of their simplicity, but their parameters, especially the transmission rate, are complex and depend on a number of factors, which makes it hard to predict how a change of a single environmental, demographic, or epidemiological factor will affect the population. Therefore, in this contribution we propose a middle ground, utilising crowd-behaviour research to improve compartmental models in crowded situations. We show how both the rate of infection as well as the walking speed depend on the local crowd density around an infected individual. The combined effect is that the rate of infection at a population scale has an analytically tractable non-linear dependency on crowd density. We model the spread of a hypothetical disease in a corridor and compare our new model with a typical compartmental model, which highlights the regime in which current models may not produce credible results.

D uring its history, humanity has repeatedly encountered major epidemics such as the Plague (Black Death) in the 14 th century, the Spanish Flu in 1918 or more recently the Swine Flu (2009), to mention only a few. Some of these epidemics severely affect the global population, such as the Plague that in only six years caused the death of almost one third of the European population.
One of the first analytical approaches to study and model the systemic nature of the spread of an infectious disease, was Kermack and McKendrick 7 who proposed what would later be modified and defined as an SIR model, that divided a population into three compartments; (i) susceptible, S; (ii) infectious, I; and (iii) recovered/ removed, R, and devised a set of differential equations to model the transition rates from each of these states to the others. This seminal work would lead to the onset of a new field of Mathematical Epidemiology. By introducing also an (iv) exposed class, E, the model becomes what is called an SEIR model 4,5 : where S 1 E 1 I 1 R 5 N (population size).
To make this model more practical for different population sizes, we will now normalise the model according to where l is the per-capita transmission rate, a is the constant recovery rate (thus, the mean infectious period is 1/a) and 1/k is the mean exposed period.
The SEIR model is based on the following four main assumptions: 1. The population size N is constant.
2. There is no heterogeneity, i.e. the impact of individual characteristics such as age, sex and behaviour are neglected. 3. The introduction of an exposed class means that individuals are not immediately able to infect others upon contracting the disease themselves 4. The population is well-mixed. This means that each individual has the same probability of contracting the disease.
Therefore, such an approach will neglect the importance of heterogeneity in the social interaction and mobility patterns of individuals. The validity of the 'well-mixed' assumption is particularly problematic. Recent Statistical Physics approaches however, have incorporated such elements into disease-spreading models 1,2 using transport networks as a proxy of human mobility, which has resulted in higher fidelity in predictions of disease outbreaks.
A growing body of literature on detailed studies of crowd behaviour in crowded places have revealed a number of insights into nonlinear, dynamic, adaptive and self-organised crowd behaviour that seriously question the validity of the well-mixed assumption in the original SIR model, even if network effects are taken into account to achieve some degree of heterogeneity.
Since the beginning of the 21 st century, mathematical epidemiology has found in the use of complex networks a new and more precise way to tackle the problem; where individuals are represented by nodes and their interactions by links. Moreover the nodes have a certain connectivity distribution i.e. the probability P(k) that a node is connected to k other nodes. This connectivity is typically either exponential (in case of exponential networks) or power-law (in case of scale-free networks). An important result was obtained in Ref. 14 where it as shown that exponential networks show an epidemic threshold that separates an infected phase from a healthy one, while the behaviour of scale-free networks depends on the power-law exponent and may or may not represent the epidemic threshold.
Extensive research has been focused on the study of epidemics through network theory. However, while contemporary network science approaches 9,10,15 to modelling the spread of disease has greatly improved the modelling accuracy at large scales, there is still limited understanding of the effect of crowd behaviour that come into play in densely crowded and confined spaces such as transport hubs, mass gatherings, and city centres.
In recent years the study of human interaction and mobility has provided some important insights that are needed to increase the realism of epidemic models, mostly at large scales 16 . In particular, sensors such as mobile phone tracking, WiFi/Bluetooth devices, and CCTV are used to acquire important information of how people move in their daily lives as well as during mass gatherings or other crowded events.
On an even smaller scale, interesting experiments are being carried on in small and crowded environments (conference halls, hospital wards, museums etc) by using wearable sensors (such as RFID devices) that are capable of sensing face-to-face interaction of individuals [17][18][19][20] . These studies have shown the importance of heterogeneity and dynamic contact patterns in shaping the dynamics of the infection, by the use of simulations, temporal networks and contact matrices. However, we still lack an analytical instrument that allows us to find a common design and evaluation strategy of the problem in general cases.
Therefore, in this contribution, we propose a way in which insights into crowd behaviour can be used to improve prevailing compartmental models (well known for their simplicity). A microscopic foundation is particularly important to evaluate potential epidemio-logical implications of designs or operational plans for one-off events where we do not have the luxury of existing data that can be analysed and used as a basis for an epidemiological model. For recurring events, our study will provide an important analytical toolkit that can be used to study the epidemiological implications of changing one of the crowd-related design or operational parameters, and keeping everything else constant.
By analysing a tractable scenario-people moving in a corridorand building an SEI model, we focus our attention at one of the most elusive parameters of mathematical epidemiology: the contact rate. We will study how contact rate depends on crowd behaviour, and most importantly, the crowd density.

Results
We recognise that human mobility in crowded places exhibit a rich set of complex, adaptive and self-organised behaviours 3 . Therefore, we propose a bottom-up modelling approach to the spread of disease in crowded places, starting with a simple but analytically tractable special case, that can later be extended to take a richer set of crowd behaviours and more complex environments into account.
Traditional SEIR model in a corridor. Before we introduce our new methodology taking crowd behaviour into account, let us start by implementing the traditional SEIR model in a corridor with unidirectional crowd flow. The length of the corridor is L metres, the width is w metres and the corridor has periodic boundary conditions (see Fig. 1). In this environment, 1% of the population is infected at time 0. Adding to the previous SEIR model assumptions, we also have the following corridor-specific assumptions: 1. Periodic boundary conditions (and thus, a constant population size N) is assumed in the corridor. 2. The time scale (i.e. maximum simulated time) is smaller than the recovery time and exposed time (i.e. only the exposed and susceptible classes will be affected within simulated time). This also means that we will not take secondary infections into account.
Now, since we aim to improve rather than to replace current models, we will put our model into prevailing terminology, and thus, each of the individuals in our model can take any of the states (i) susceptible, s; (ii) infectious, i; and (iii) exposed, e, which represents individuals that have become infected after contact with an infected person but who are not yet infectious due to the relatively short duration of our simulation, for this same reason the compartment r is not present. We will thus obtain the normalised SEI model described by Eq. (3).
where i 0 5 0.01 (and s 1 e 1 i 5 1 since we do not have the 'r' compartment anymore).
Improved density-dependent SEIR model. Since we want to take crowd behaviour explicitly into account, we will now propose an improved model where l will depend on the spatio-temporal distribution of the crowd density.
Earlier work [11][12][13] has studied distances that large droplets are carried; (i) 6 m away by exhaled air at a velocity of 50 m/s (sneezing); (ii) 2 m away at a velocity of 10 m/s (coughing); and (iii) 1 m away at a velocity of 1 m/s (breathing).
We will build on these findings and therefore replace the earlier assumption 4 (i.e. 'The population is well mixed') with the following: Infections can only happen within an R-metre radius of an infected individual. The final new assumption that the new model will be based on is: 1. The mixing rate (i.e. the arrival rate of new pedestrians) around infected individuals due to variable walking speeds is much higher than the rate of infection, which means that we do not need to take local saturation effects into account. In previous work 4, 6 , the per-capita transmission rate l has been defined as a constant, reflecting the average number of contacts within a population. Such an approach can lead to accurate outbreak predictions only if l is calibrated to specific cases, but it does not incorporate the effect of crowd behaviour which can lead to characteristically different contact rates in different scenarios. We also assume that the mixing rate (i.e. the rate of arrival of new pedestrians) around infected individuals due to variable walking speeds is much higher than the rate of infection (which means that we do not need to take local saturation effects into account).
In our case, each infected individual can only reach n other individuals within a 1-metre radius (see Fig. 1).
In reality, this number n(t) is time dependent, n(t) 5 r(t)pR 2 , since it depends on the local density r(t) (1/m 2 ) of the crowd surrounding each infected individual, multiplied by the area pR 2 where an infection can take place.
Taking time-dependent crowd behaviour into account would make our model (Eq. (3)) too complicated for practical purposes. Therefore, we will re-formulate crowd-dynamic theory into a statistical description that we can later incorporate into our existing model (Eq. (3)). To do so, we will start with a previously discovered Eulerian description 8 of the crowd density distribution p s , and re-formulate this into a Lagrangian description p t that will better capture the effect that pedestrians spend longer time in crowded areas due to their lower walking speed in such areas.
We know from Ref. 8 that if a random location is picked somewhere within an open space with area A (m 2 ) and the average density % (of the whole space) is %~N=A m {2 À Á , the local density r (m 22 ) can be described by a Gamma distribution, according to Eq. (4).
Moreover, if a random pedestrian i (in our case an infected individual) is picked at a random time t, the local density r around that individual i will follow the distribution described by Eqs. (5) and (6). where: is the average velocity (1.34 m/s is the average value of the free speed when the corridor is empty 8 ) and r max is the maximum value of the density.
The probability density functions of the crowd density over space (p s ) and space-and-time (p t ) are illustrated in Fig. 2. The reason why p t is skewed towards larger crowd density values compared to p s is because pedestrians walk slower when they are in dense regions compared to less dense regions, which means that they will spend more time in crowded areas compared to less crowded areas.
Because walking speed is a decreasing function of crowd density, individuals spend relatively longer periods of time in crowded areas compared to less crowded areas. Therefore, from an individual perspective, the average value of the local density around an infected individual is given by Eq. (7).
which means that, substituting this value in the definition of the contact rate we obtain: where c is the transmissibility i.e. the fraction of infected-susceptible contacts that actually lead to an infection.
In this way, we have obtained a new description of the transmission rate between pedestrians in a crowded location, a value that depends on the crowd density. With the previous model definition, the value of the transmission rate did not depend on the crowd density in the corridor which is the case for the new model.
Numerical results. Let us now compare our new density-dependent model with the previous model to investigate what effect a crowddensity-dependent per-capita infection rate l density-dependent will have on the population scale. As can be seen in Fig. 3, the rate of infection per unit time is a non-linear function of crowd density. For practical purposes, the infection rate per distance walked is a better metric than the infection rate per unit time. For example in a busy transport hub, Figure 2 | Probability-density of the spatial distribution p s of the local crowd density r in a space with an average crowd density %~N=A~2:5 m {2 as an example, compared with the (spatio-)temporal distribution p t of local density across the same space. The difference is due to a lower walking speed in high density areas, which will lead to individuals spending longer time in dense areas compared to lower density areas. people have to walk a fixed distance from one platform to another regardless of the level of crowdedness and will therefore spend longer time altogether in that transport hub on crowded days (thus, being even more affected by prevalent diseases in that space).
To show the population-scale impact of our proposed model, we simulate a hypothetical disease that spreads between two individuals when they are at close proximity. We set the constant per-capita transmission rate to l constant 5 0.01. If we take AErae 5 1.0 m 22 as an arbitrary crossover point between l constant and l density-dependent , we get c 5 0.01/(pR 2 ) which will define l density-dependent according to Eq. (8).
We now run our old and new density-dependent SEI models next to each other for different values of average density in the interval We then plot the model results for the fraction of exposed individuals in the population, for a population size N 5 1000, as a function of both time t, and average crowd density % (see Fig. 4 for results). It is clear that even though the old and the newly proposed model give comparable results within a narrow range of crowd densities, for large portions of the density spectrum, the old model either severely under-or over-estimates the rate of infection of the disease.
Model validation. A comprehensive quantitative model validation is not within the scope of this paper. However, to be able to validate the main characteristics of our model, we will run an agent-based crowd model of unidirectional crowd flow in a corridor. Our crowd model is based on the social-force model 21 where the motion of pedestrian i is described by where m is the mass of the pedestrian (in kg),ṽ is the walking velocity, andf i is a force described bỹ wheref ij is a repulsive force from pedestrian j acting on pedestrian i specified asf where d ij is the distance from pedestrian i to pedestrian j,ẽ ij is the normalised vector pointing from the centre of mass of pedestrian j to the centre of mass of pedestrian i.
The forcef ik from boundary k onto pedestrian i is also specified by Eq. (11), but the position of pedestrian j is replaced by the closest point on boundary k.
The model parameters used are the same as in Ref. 21. Note that our model does not use the second force component (D 1 5d i j) k used to accurately model turbulent and very dense crowds, since that is not the focus of our paper. The free speeds v 0 are Gaussian distributed with mean 1.34 m/s and standard deviation 0.26 m/s when pedestrians are unobstructed. However, speeds are also bounded by The mass m of pedestrians are Gaussian distributed with mean 60 kg and standard deviation 10 kg; the maximum pair-wise repulsive force is set to F 5 160 N; the relaxation time is set to t 5 0.5 s; and the remaining parameter D 0 that determines the typical length scale of the repulsive forces is set to D 0 5 0.31 m.
The results of our agent-based model (see Fig. 5) matches the main characteristics of our model results as shown in Fig. 3. Notably, the number of infections per metres walked peak at an average crowd density above 6 m 22 . This highly non-linear behaviour which is the result of many interactive bodies would normally need a microscopic treatment. However, as we have shown here, the outcome of this non-linear behaviour can be accurately accounted for by our macroscopic equations where the only input needed is the average crowd density. This opens up new application areas for compartmental models where they have been less suitable in the past, such as environments containing large and dense crowds. A more comprehensive model validation will be carried out in future work, to empirically test the accuracy of our model predictions in dense and confined spaces, when such data becomes available.

Discussion and future work
Our study shows a way in which detailed insights gained from datadriven crowd research can be utilised to improve current disease spreading models in an analytically tractable way. Thus, the spread of disease depends strongly on the behaviour of the crowd, and in particular the contact rate is a key parameter in the study of the evolution of a disease and it varies considerably depending on the density of the population in the studied environment.   We have found that three different but related effects of crowd behaviour affect the rate of infection in a confined space. All three effects contribute to an increased infection rate for increasing crowd density; (i) increasing crowd density leads to decreased proximity, but this leads to a higher number of individuals within the range of infection around an infected individual; (ii) density-dependent walking speeds lead to individuals spending longer periods of their time in crowded locations compared to less crowded locations within an area with locally varying crowd densities (thus, the mean crowd density underestimates the typical proximities between individuals); and (iii) for the same reason, individuals also spend longer time altogether in environments such as transport hubs on crowded days compared to less crowded days, which leads to an increasing rate of infection per distance walked.
Comparing our new model with existing models, we have shown how existing models may give significant over-or under-estimations of the spread of disease in certain settings, and we have also shown how current models can be improved by a modification of the l parameter.
Moreover, the use of a generic definition of the transmission rate that depends on the transmissibility of the disease, the area considered and the density of the population, is a powerful tool that can be used in many different crowd scenarios, and will decrease the dependency on a-posteriori empirical transmission rate data every time one of the environmental or crowd-related parameters have changed.
Having a more precise prediction of what may, or may not, happen in crowded environments in terms of contagion between individuals could be highly useful to be able to devise adaptive control strategies during mass gatherings, in busy transport hubs, or in busy city centres. Note that the average walking speed AEv(r)ae is truncated at an upper boundary at 1.34 m/s (i.e. the 'free' unobstructed walking speed), and by a lower boundary at 0.01 (m/s). All intermediate walking speeds are calculated using the non-linear function specified by Eq. (6).  Figure 4 was created by solving Eq. (3) for a population size N 5 1000, using our proposed density-dependent specification of l obtained from Eq. (8) and comparing the results to the number of exposed individuals produced using a constant value l constant . The range of crowd density is exactly the same as the one used in Fig. 3. The time axis uses a time step of 60 seconds and the time range used is [0, 6 3 3600] seconds (i.e. six hours). pedestrians in a corridor with periodic boundary conditions. The model ran for 10 simulated seconds and data from t , 2 s was excluded because of the time needed for pedestrians to accelerate from their initial zero speeds at the start of the simulation. The model equations (9)-(11) were solved numerically using the 1st order Euler method with a time step of 0.05 s.