Predicting regional COVID-19 hospital admissions in Sweden using mobility data

The transmission of COVID-19 is dependent on social mixing, the basic rate of which varies with sociodemographic, cultural, and geographic factors. Alterations in social mixing and subsequent changes in transmission dynamics eventually affect hospital admissions. We employ these observations to model and predict regional hospital admissions in Sweden during the COVID-19 pandemic. We use an SEIR-model for each region in Sweden in which the social mixing is assumed to depend on mobility data from public transport utilisation and locations for mobile phone usage. The results show that the model could capture the timing of the first and beginning of the second wave of the pandemic 3 weeks in advance without any additional assumptions about seasonality. Further, we show that for two major regions of Sweden, models with public transport data outperform models using mobile phone usage. We conclude that a model based on routinely collected mobility data makes it possible to predict future hospital admissions for COVID-19 3 weeks in advance.


I. INTRODUCTION
Infectious diseases are disseminated through transmission of infectious agents in association with physical meetings (social contacts) between individuals.These meetings occur at home or at other locations such as workplaces or schools, which are reached using some means of transportation, e.g. by car, public transport or foot.The meetings tend to take on regular patterns and variations, and these can be used for different types of analytic purposes [1,2].
The COVID-19 pandemic has affected the society in numerous ways.One striking feature is the reduction in individual mobility, which has been enforced either by strict legal lockdowns or, as in the case of Sweden, by recommendations to the general public.This reduction in mobility has had the intended effect of "flattening the curve" during the first, second and possibly future waves of the pandemic.
Obtaining an understanding of the effect of mobility on the transmission of COVID-19 requires an ability to measures and quantify said changes.This has been achieved by geographically tracking cell phone usage , either directly by mobile phone operators [3] or via usage of Google services [4] that are readily available for all regions.In addition to this, mobility has also been measured by considering the utilisation of public transport [5].This type of information has been used in a number of studies in order to model and understand the pandemic.Linka et al. used mobility data to obtain a correlation between the reproduction number and public health interventions [6], while Zhou et al. investigated the delay of outbreaks caused by mobility restrictions [7].Another application of mobility data is to make model-informed choices between different reopening strategies [8].
Given the time delay between initial infection and potential hospital treatment, mobility data, such as records of daily commuters, also offer an opportunity to make predictions about the coming number of cases [9].Such models can be useful for hospital administration since it allows for planning and a higher degree of preparedness for coming surges in the need of hospital beds.The aim of this study was to investigate whether variations in data reflecting weekly commuting rates were associated with later COVID-19 hospitalisation rates and also compare the ability of different data sources to achieve this aim.The underlying assumption is that decreased levels of local commuting reflect a corresponding decrease in COVID-19 transmission.

II. METHODS
A retrospective design was used for data collection and analysis.We developed an SEIRmodel of disease transmission which outputs the expected number of hospital admissions.
Here we describe the hospital admission and mobility data, the epidemiological model that we have used as well as the method for fitting the model to data.The code for the model and the data used is available at: http://www.math.chalmers.se/~gerlee/SEIR.html

A. Data
Endpoint data: We consider hospital admission data from Sweden at the regional level aggregated by National Board of Health and Welfare [10].The data contains the total number of newly admitted patients diagnosed with COVID-19 per week, starting with week 10.The data is reported separately for each of the 21 regions in Sweden.Missing data points were replaced by zeroes for all regions.
Syndromic data: In order to account for changes in behaviour due to governmental recommendations we have made use of mobility data from two sources: public transport data from the public transport authorities in Region Västra Götaland and Region Skåne called Västtrafik (VT) and Skånetrafiken (ST), and Google mobility reports (GMR).The VT-and ST-data describe the total number of journeys made by public transport in the region and are reported on a weekly basis.Data are given in terms of a percent change compared to travel during week 9.The GMR-data also describes the change in mobility compared to a baseline, which is the median value from the 5-week period Jan 3 -Feb 6, 2020.Mobility is split into place categories and we have used values from the category 'transit stations'.The GMR-data is reported on a daily basis and in order to make it compatible with the model we calculate weekly averages.Figure 6 in the Appendix shows the above mobility measures as a function of time.

B. Epidemiological model
To model the weekly time series of COVID-19 related hospital admissions we have used an SEIR-model with time-dependent infectivity β(t) which is informed by mobility measures.
Infectivity is assumed to vary with mobility such that the number of new social contacts for each infected individual increases with travel.
We assume that mobility measured by public transport utilisation and mobile phone usage reflects the general level of mobility in each region, which is then assumed to impact the contact rate and consequently the infectivity.Note that we do not assume that disease transmissions occurs exclusively during travel, but rather that the above mobility measures serve as a useful proxy for the rate of social contacts.
The model is defined in terms of the following set of coupled ordinary differential equations: Here ρ is the rate at which people leave the exposed compartment, γ is the rate of recovery and N is the population size of the region.In order to solve the system we also need to specify an initial condition and when in time it occurs.We assume that all individuals are susceptible except an initial number of I 0 of infectious individuals at t 0 weeks prior to the first data point in the admission data (week 10).
To connect the dynamics of the SEIR-model with hospital admissions we assume that individuals in the infectious compartment give rise to future hospital admissions.To model this we assume that the number of hospital admissions t a weeks into the future is given by a fraction p of the present number of infectious individuals.

C. Model parametrisation and fitting
The parameters of the SEIR-model were taken from previously published studies and we have used ρ = 1.37 week −1 (corresponding to a latency period of on average 5.1 days) and recovery rate γ = 1.4 week −1 (corresponding to a infectious period of on average 5 days) [11].
Since testing was limited during the early stages of the pandemic in Sweden it is difficult to estimate the initial condition for our model.For simplicity we assume a single infected individual in a population of susceptibles appearing t 0 = 4 weeks prior to the first data point.
Adjusting the initial condition for each region could possibly yield more accurate prediction, but here we have chosen a robust initial condition which gives sensible predictions for all regions.
The scaling that relates the number of infected to hospital admissions was set to p = 0.023 in accordance with a previous study [11].The time lag from infection to hospital admission was set to t a = 3 weeks.This value is related to the time from infection to hospital admission, which has been reported to be 17 days (5 days latency [11] plus 12 days from symptom onset to admission [12]).However, it should not be interpreted as a parameter describing the fate of an individual patient, but should rather be interpreted as the time it takes for changes in disease transmission to propagate (sometimes via secondary cases) to hospital admissions.
A previous study using mobility data has shown a time delay in admissions due to mobility restrictions in the range of 9-25 days [13], which covers our assumed value of 21 days.
Given the uncertainty in many of the above parameter values we have carried out a sensitivity analysis by varying one parameter at a time within a reasonable range.The results of this analysis is presented in the Appendix.
The infectivity β(t) is informed by the mobility data in the following way: For Västra Götaland and Skåne we use the public transport data and assume a linear relationship where a, b are parameters that are fitted to the admission data (see below for details) and V (t) is the change in travel during week t.For all other regions we use the GMR-data in a similar way and assume that where G i (t) is the GMR-data (place category 'transit stations') for region i, and a, b are parameters that are estimated.
In order to account for the fact that not only mobility changed at the onset of the pandemic, but also other circumstances such as physical distancing and increased hand hygiene, we adjust the baseline values for V (t) and G i (t) from 0 to 0.2.
The infectivity parameters a, b are estimated by minimising the mean squared error (RMSE) with respect to θ = (a, b).Here pI(t i + t a , θ) is the predicted number of hospital admissions and A(t i ) is the actual number of admissions and the sum runs over all time points t i .To find the minimum RMSE we use the grid search method with 80 linearly spaced values in the range 1-12 for both a and b [14].For each region i we thus obtain a set θi = (â i , bi ) of estimated parameters.When comparing the model error between different regions we normalise the RMSE by dividing with the maximum number of weekly admissions for each region.
In order to quantify the uncertainty in our parameter estimates we select all parameter sets (a, b) that achieve an RMSE of within 20% of E( θ).We solve the SEIR-model for all those parameter combinations and remove the lower and upper 5th percentile to obtain a 95% credible interval.This procedure corresponds to sampling from the posterior in an Approximate Bayesian Computation framework with E(θ) as our summary statistic [15].
For Region Västra Götaland we fit the mobility-driven SEIR-model (1) using increasing amounts of reported hospital admissions.We start by including data up until week 20 and test the models predictive ability in terms of the mean average predictive error (MAPE) on the coming three weeks.This procedure is repeated for increasing amounts of training data.
To illustrate the robustness of the model we also plot how the estimated model parameters â and b change as we include more weekly data.

A. Predicting hospital admissions using public transport utilisation
For Region Västra Götaland the resulting model error in terms of MAPE can be seen in figure 1A.By successively increasing the training data, we see in fig.1B that the model remains largely unchanged beyond week 30, which timewise corresponds to the end of the first wave of the pandemic.
When using all available data we find that â = 4.16 and b = 5.74 (fig.1C), and we note that the model captures the dynamics of admissions during both the first and beginning of the second wave, although the rate of decline during the first wave is overestimated.

B. Using Google mobility data to predict hospital admissions
For all other regions we make use of Google mobility data (see Methods for details).
Figure 2 shows model fits for Östergötland and Stockholm (see fig. 5 in the Appendix for model fits to admissions in all Swedish regions and table I for normalised RMSE and estimated parameters).Again, we note that the model correctly describes the timing of the first and second wave.Visual inspection of the model fits for all regions suggest that the model performs better for regions with a larger population.

C. Public transport data improves model fit compared to google mobility reports
For Skåne Region we have both public transport data and GMR-data, which was used on all other regions.Figure 3 shows the best model fits using the mobility data from the public transport agency Skånetrafiken compared to GMR-data.We note that although the model

IV. DISCUSSION
We set out to investigate whether variations in data reflecting weekly commuting rates were associated with later COVID-19 hospitalisation rates.It was found that COVID-19 hospital admission can be modelled using time-dependent mobility data and that a SEIR model can be fitted using two free parameters to regional data from Sweden.
Our approach is similar to a recent study by Chang et al. [14] who used spatially resolved mobility data in order to model disease transmission in metropolitan areas in the US.They compared their model output to COVID-19 incidence, whereas we have focused on hospital admissions.The reasons for this are twofold: firstly, the data on incidence in Sweden is unreliable due to limited testing and secondly hospital admissions is a more interesting Despite these simplifications, the model was able to capture the general shape and timing of both the first and the beginning of the second wave for most regions.There appears to be a link between the population size of the region and the goodness of fit.The model fits the admission data for larger regions better, and a possible explanation for this is the large degree of randomness seen in the smaller regions.A recurring feature seen across most regions is the inability of the model to accurately describe the width of the first peak.The model tends to underestimate the actual width, and this is likely due to the lack of detail Here we present model fits for all Swedish regions expect Gotland for which no data was available from the National Board of Health and Welfare.
FIG. 5. Optimal model fit for all Swedish regions except Gotland.Estimated parameter values can be found in table I.

FIG. 1 .FIG. 2 .
FIG. 1. Model fit to admission data from Region Västra Götaland.A The model error in terms of the MAPE on 3 week predictions as a function of the number of weeks of data used in the fitting.B The estimated model parameters (â, b) as a function of the number of weeks of data used in the fitting.C The optimal fit when all data points are used (until week 45).The dashed lines show the 95% credible interval for the model fit (see Methods).

FIG. 3 .
FIG. 3. Optimal model fit for Skåne Region using mobility data from public transport (red line) and Google Mobility Report (black line).

Figure 4 FIG. 4 . 14 B
Figure4shows how the model error (RMSE) of the best fit for Region Västra Götaland changes when the parameters p, t a ,I 0 and V (t = 0) are varied.We note that it is possible to achieve a slightly better model fit when the probability of hospitalisation is lowered to p = 0.1, but the improvement in model fit is minor.For the delay we see that our value of t a = 3 weeks lies close to a local minimum, but little would be gained (in terms of RMSE) by increasing the delay.The number of infected individuals at t = 0 has a more complicated impact on the error.A smaller RMSE could be achieved by increasing I 0 from its default value of 1, but the improvement is again minor.Lastly, the initial infectivity has a minor impact on the model error as long as it remains below 0.6.