Epidemic modelling of monitoring public behavior using surveys during pandemic-induced lockdowns

Koher, Andreas; Jørgensen, Frederik; Petersen, Michael Bang; Lehmann, Sune

doi:10.1038/s43856-023-00310-z

Download PDF

Article
Open access
Published: 08 June 2023

Epidemic modelling of monitoring public behavior using surveys during pandemic-induced lockdowns

Communications Medicine volume 3, Article number: 80 (2023) Cite this article

866 Accesses
1 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Background

Implementing a lockdown for disease mitigation is a balancing act: Non-pharmaceutical interventions can reduce disease transmission significantly, but interventions also have considerable societal costs. Therefore, decision-makers need near real-time information to calibrate the level of restrictions.

Methods

We fielded daily surveys in Denmark during the second wave of the COVID-19 pandemic to monitor public response to the announced lockdown. A key question asked respondents to state their number of close contacts within the past 24 hours. Here, we establish a link between survey data, mobility data, and hospitalizations via epidemic modelling of a short time-interval around Denmark’s December 2020 lockdown. Using Bayesian analysis, we then evaluate the usefulness of survey responses as a tool to monitor the effects of lockdown and then compare the predictive performance to that of mobility data.

Results

We find that, unlike mobility, self-reported contacts decreased significantly in all regions before the nation-wide implementation of non-pharmaceutical interventions and improved predicting future hospitalizations compared to mobility data. A detailed analysis of contact types indicates that contact with friends and strangers outperforms contact with colleagues and family members (outside the household) on the same prediction task.

Conclusions

Representative surveys thus qualify as a reliable, non-privacy invasive monitoring tool to track the implementation of non-pharmaceutical interventions and study potential transmission paths.

Plain language summary

Mobile phone data obtained from companies such as Google and Apple have often been used to monitor public compliance with pandemic lockdowns and make predictions of future disease spread. Survey data obtained by asking people a series of questions can provide an alternative source of information. We undertook daily surveys of a representative subset of the Danish population immediately before, and during, a lockdown during the COVID19 pandemic. We compared the modeling results obtained from the surveys with data derived from the movement of mobile phones. The self-reported survey data was more predictive of future hospitalizations due to COVID than mobility data. Our data suggest that surveys can be used to monitor compliance during lockdowns.

Living in a pandemic: changes in mobility routines, social activity and adherence to COVID-19 protective measures

Article Open access 27 December 2021

A dataset to assess mobility changes in Chile following local quarantines

Article Open access 03 January 2023

Real-time estimation of the effective reproduction number of COVID-19 from behavioral data

Article Open access 05 December 2023

Introduction

Pandemic management is a balancing act. When an outbreak of infections flares up, governments and authorities need to impose restrictions and recommendations on society that are carefully calibrated to the situation. On the one hand, during the COVID-19 pandemic, such non-pharmaceutical interventions have considerable benefits by changing the dominant transmission route—close contacts between individuals—via the incentives and information they provide^1,2. On the other hand, these interventions have considerable costs in the form of negative externalities relating to the economy and mental health^3,4,5.

This balancing act puts authorities and governments in need of information to continuously calibrate the level of restrictions. It is not a matter of simply sending out a single set of instructions regarding restrictions and recommendations. Rather, authorities need to continuously receive information about the effectiveness of those restrictions and recommendations and adjust accordingly. An obvious source of information is directly related to the epidemic and includes the number of infection cases, hospitalizations, and deaths. Yet cases of infection are difficult to monitor and, for example, changes in the public’s motivation to participate in testing programs may create problems with respect to comparisons over time⁶. Furthermore, there is a significant lag between the onset of interventions and hospitalizations and death counts, which imply that it is difficult to calibrate interventions on the basis of such information. Consequently, researchers, authorities and governments worldwide have complemented epidemiological information with information on the direct target of the interventions: Behaviour^7,8.

In this manuscript, we assess the predictive performance of a particular source of information about behavior during lockdowns: Population-based surveys on social contacts, fielded daily to representative samples of the Danish population during the COVID-19 pandemic (see Methods for details on this dataset). This assessment aligns with recommendations about the use of surveys as epidemic monitoring tools on the basis of experiences during the SARS epidemic in Hong Kong⁹ and recommendations from the World Health Organization during the COVID-19 pandemic¹⁰. From a public health policy perspective, this particular dataset is a unique test case as it was, in fact, reported to the Danish government for this purpose on a twice-weekly basis during the second wave of COVID-19 infections in December 2020.

Furthermore, these data are unique in another respect: They constitute an open and ‘citizen science’¹¹ alternative to the most used source of information on pandemic behavior: Mobility data. As we detail below, mobility data as a source of information may be problematic from both a methodological and policy perspective. Mobility data provides a proxy for close contacts between people and has been heavily utilized by researchers and public health institutions^8,12,13,14. Mobility data quantifies the population’s movement patterns and is unobtrusively obtained in a number of ways, for example, via people’s smart phones and provided to researchers and governments via private companies such as Google¹⁵. This reliance, however, can and has raised concerns. First, in many cases, it implies that pandemic management and research relies on the willingness of private companies to share information during a critical crisis. Second, citizens themselves may be concerned about real or perceived privacy issues related to the sharing of data with authorities^16,17. Given the importance of public trust for successful pandemic management¹⁸, such concerns—if widespread—can complicate pandemic control. Third, data from companies such as Google, Facebook and local phone companies may not be representative of the population of interest: The entire population of the country. Rather than being invited on the basis of traditional sampling methods, people opt-in to the services of different companies and, hence, the data from any single company is likely a biased sample. In this sense, we argue that the known unknowns of survey data (e.g. we know we do not observe anyone under the age of 18) is preferable to the unknown unknowns of large-scale passive surveillance data. Fourth, the movements of people in society as captured by mobility data is only a proxy of the quantity of interest: Actual close encounters between individuals that drive the pandemic.

For these reasons, it is key to assess alternative sources of information about public behavior such as nationally representative surveys of the adult population. In principle, surveys could alleviate the problems relating to the collection and validity of mobility data. Survey research is a centuries old low-cost methodology that can be utilized by public actors and that relies on well-established procedures for obtaining representative information on private behaviours in voluntary and anonymous ways¹⁹.

At the same time, data from surveys come with their own methodological complications. As documented by decades of research, people may not accurately report on their own behaviour²⁰. Survey answers during the pandemic may be biased by, for example, self-presentational concerns and inaccurate memory. While research on survey reports of behaviour during the pandemic suggests that self-presentational concerns may not affect survey estimates²¹, memory biases may (although such biases are likely small for salient social behavior)²². Even with such biases, however, surveys may be fully capable to serve as an informative monitoring tool. The key quantity to monitor is change in aggregate behaviour over time. If reporting biases are randomly distributed within the population, aggregation will provide an unbiased estimate. Even if this is not the case, changes in the survey data will still accurately reflect changes in population behaviour as long as reporting biases are stable within the relevant time period.

On this basis, the purpose of the present manuscript is, first, to examine the degree to which survey data provide useful diagnostic information about the trajectory of behavior during a lockdown and, second, to compare its usefulness to information arising from mobility data. To this end, we focus on a narrow period around Denmark’s lockdown during the second wave of the COVID-19 epidemic in the Fall of 2020, i.e., prior to vaccine roll-out when it was crucial for authorities to closely monitor public behavior. We illustrate the usefulness of survey data on a narrow window of time because the changing nature of factors such as seasonal effects, new variants, vaccines, changing masking efforts, etc., make it difficult to model COVID-19 transmission across long periods without making a large number of assumptions⁶. See also Sec. 3 for a discussion on the limitations of our survey data. In spite of the limited scope, we believe that the study remains relevant for policy makers because it allows to monitor public behaviour at a crucial moment, when policy makers should not be forced to rely on proximity or mobility data from private companies in the absence of timely incidence data.

Specifically, we ask whether (a) daily representative surveys regarding the number of close social contacts and (b) mobility data allow us to track changes in the observed number of hospitalizations in response to the lockdown. In addition, to further probe the usefulness of survey data, we provide a fine-grained analysis of how different types of social contacts relate to hospitalizations. Our results shed new light on the usefulness of survey data. Previous studies during the COVID-19 pandemic have documented high degrees of overlap between self-reported survey data on social behavior and mobility data, but have not assessed whether these data sources contain useful information for predicting transmission dynamics^23,24. One study did compare the predictive power of mobility data to survey data on the psychosocial antecedents of behavior²⁵ and found that mobility data was more predictive than the survey data of COVID-19 transmission dynamics. Here, we provide a more balanced test by comparing the predictive value of mobility data and survey data when directly focused on self-reported behavior rather than simply its psychosocial antecedents.

We find that, unlike mobility, self-reported contacts decreased significantly in all regions of Denmark before the nation-wide implementation of non-pharmaceutical interventions. This change in behaviour corresponds well to the inferred reproduction number suggesting that self-reported survey data can be used to monitor compliance during lockdowns and improve short-term predictions of future hospitalizations. Further analyses of contact type show that contacts to friends and strangers outperform contacts with colleagues and family members (outside the household) as predictors for future hospitalization.

Methods

Data

We use survey data from the HOPE (‘How Democracies Cope With COVID-19’) research project (www.hope-project.dk). Specifically, the HOPE-project fielded daily nationally representative survey in Denmark starting from mid-May 2020. Kantar Gallup, a private company, conducted the data collection until March 2022. Each day a nationally representative sample (with a daily target of 500 complete interviews) reports on their protective behaviour and perceptions of the COVID-19 pandemic. Participants are Danish citizens aged 18 years or older. They are recruited using stratified random sampling—on age, sex and geographical location—based on the database of Danish civil registration number. The data collection fully complies with Aarhus University’s Code of Conduct and with the ethical standards set by the Danish Code of Conduct for Research Integrity. The legal aspects of the data collection was approved by Aarhus University’s Technology Transfer Office. As per section 14(2) of the Act on Research Ethics Review of Health Research Projects, “notification of questionnaire surveys … to the system of research ethics committee system is only required if the project involves human biological material.” All participants provided informed consent. The mobility data comes from Apple²⁶, Google²⁷ and major Danish mobile phone network operators²⁸. For further details on the data, see Supplementary Note 1 and Supplementary Note 2.

Model description

We observe regional COVID-19 related hospitalizations, which derive from an initial number of infected and the time-varying reproduction number. We parametrize the latter using behavioural survey data and mobility time series. Our approach is a variant of the semi-mechanistic hierarchical Bayesian model of Flaxman et al.²⁹ and Unwin et al.³⁰, with the key difference that we use daily COVID-19 related hospitalizations. In Denmark, hospitalizations are a reliable proxy for pandemic activity available. Unlike death counts, hospitalizations are recorded with a significantly smaller delay and give a better signal-to-noise ratio for regions with little epidemic activity. The number of positive PCR-cases, on the other hand, suffers from confounding through varying test intensity during the Christmas holidays and more importantly, we can rely on a well-studied infection-to-hospitalization delay distribution, which is less sensitive to country-specific testing protocols.

The code is written in the Julia programming language³¹ using the Turing.jl package³² for Bayesian inference. The source code is fully accessible on GitHub³³ and we summarize sampling details in Supplementary Note 3. In the following, we provide the mathematical details of the epidemiological model.

Observation model

As observations, we take the daily number of hospitalizations H_t,r at time t in region r and assume these are drawn from a Negative Binomial distribution with mean h_t,r and over-dispersion factor ϕ:

$${H}_{t,r} \sim \,{{\mbox{NegBinom}}}\,\left({h}_{t,r},{h}_{t,r}+\frac{{h}_{t,r}^{2}}{\phi }\right)$$

(1)

$$\phi \sim \,{{\mbox{Gamma}}}\, ({{\mbox{mean}}}=50,{{\mbox{std}}}\,=20)$$

(2)

From the expected number of hospitalizations h_t,r, we derive the latent, i.e., unobserved number of new infections i_t,r. Two factors link infections to hospitalizations: (a) The conditional probability α of hospitalization following an infection and (b) the corresponding delay distribution π:

$${h}_{t,r}=\alpha \mathop{\sum }\limits_{\tau =0}^{t-1}{i}_{\tau ,r}{\pi }_{t-\tau }$$

(3)

$$\alpha \sim {{{\mbox{Normal}}}}^{+}(0.028,0.002)$$

(4)

We estimate the infection hospitalization rate α in Eq. (4) from a sero-prevalence study³⁴. The results are, however, not sensitive to this value as we don’t account for the depletion of susceptible. The delay π is a sum of two independent random variables, i.e. the incubation period and the time from onset of infection to hospitalization³⁵. We take the corresponding distributions from previous studies and parametrize the incubation period by a Gamma distribution with a mean of 5.1 days and a coefficient of variation of 0.86³⁶ and the infection to hospitalization delay by a Weibull distribution with a mean of 5.506 days and a shape parameter 0.845³⁵, which corresponds to a standard deviation of 8.4 days:

$$\pi \sim \,{{\mbox{Gamma}}}\, ({{\mbox{mean}}}=5.1,{{\mbox{CV}}}\,=0.86)\\ +\,{{\mbox{Weibull}}}\, ({{\mbox{shape}}}=0.845,{{\mbox{scale}}}\,=5.506)$$

(5)

We then discretize the continuous distribution π by ${\pi }_{i}=\int\nolimits_{i-0.5}^{i+0.5}g(\tau )d\tau$ for i = 2, 3, … and ${\pi }_{1}=\int\nolimits_{0}^{1.5}g(\tau )d\tau$ for application in Eq. (3).

Infection model

The (unobserved) number of new infections, i_t,r, evolves according to a discrete renewal process. This approach has been widely used in epidemic modelling^29,37,38,39, is related to the classical susceptible-infected model⁴⁰ and has a theoretical foundation in age-dependent branching processes^37,41. New infections in region r at time t are a product of the time-varying reproduction number R_t,r and the number of individuals that are infectious at time t. The latter is a convolution of past infections and the generation interval g_τ:

$${i}_{t,r}={R}_{t,r}\mathop{\sum }\limits_{\tau =0}^{t-1}{i}_{\tau ,r}{g}_{t-\tau }$$

(6)

The generation interval g translates past infections to the present number of infectious individuals and following previous studies, we assume a Gamma distribution density g(τ) with mean 5.06 and SD 2.11⁴²:

$$g \sim \,{{\mbox{Gamma}}}\, ({{\mbox{mean}}}=5.06,{{\mbox{SD}}}\,=2.11)$$

(7)

Again, we discretize the continuous distribution by ${g}_{i}=\int\nolimits_{i-0.5}^{i+0.5}g(\tau )d\tau$ for i = 2, 3, … and ${g}_{1}=\int\nolimits_{0}^{1.5}g(\tau )d\tau$ to be used in Eq. (6). The convolution in Eq. (6) requires a history of infectious individuals for initialization, which we estimate prior to the analysis as described below.

Transmission model

At the heart of the analysis is the instantaneous reproduction number R_t,r for region r at time t. It determines the number of secondary transmissions from the current number of infectious individuals. We implement a parametric and a non-parametric variant of the model akin to⁴³.

The non-parametric model implements a latent random-walk, i.e., a AR(1) process that allows to track daily changes of the reproduction number:

$${R}_{t,r}={R}_{0,r}\exp ({\rho }_{t,r})$$

(8)

$${\rho }_{t,r} \sim \,{{\mbox{Normal}}}\,({\rho }_{t-1,r},\sigma )$$

(9)

$$\sigma \sim {{{\mbox{Normal}}}}^{+}(0.3,.02)$$

(10)

Here, the latent variable ρ_t,r performs a random walk with a typical step size of σ. Hence, the number of inferable parameters ρ_t,r equals the number of observation days for each region r. The step size σ determines the smoothness of the resulting reproduction number and we choose the same prior distribution as in ref. ³⁰. The non-parametric model allows us to infer the “ground truth” that we use for visual comparison.

The parametric model, on the other hand, takes a data stream X_t,r for every region r as a parametrization of the reproduction number:

$${R}_{t,r}={R}_{0,r}\exp ({e}_{r}{X}_{t,r})$$

(11)

$${e}_{r} \sim \,{{\mbox{Normal}}}\,(e,s)$$

(12)

$$e \sim \,{{\mbox{SkewedLaplace}}}\,(\mu =0,\sigma =0.7,\alpha =0.2)$$

(13)

$$s \sim \,{{\mbox{Gamma}}}\, ({{\mbox{mean}}}=0.07,{{\mbox{SD}}}\,=0.05)$$

(14)

The predictors are normalized such that X_t,r gives the change in behaviour at time t relative to the first day, i.e. t₀ = 2020-12-01, in region r. Thus, the effect size e_r in Eq. (11) translate a relative change in the predictor X_t,r to a change in the regional reproduction number R_t,r. We pool information in order to reduce regional biases and to give a robust country-level effect estimate e akin to multi-level models⁴⁴.

With more contacts or a higher mobility level, we expect an increased disease transmissibility and therefore, we choose a skewed Laplace distribution as a prior for the pooled effect parameter μ_e⁴⁵. Furthermore, we choose a shrinking prior on the dispersion parameter s to limit regional differences and thus reduce potential overfitting given the limited data. Note, however, that substantial effect differences are still inferrable if the data provides sufficient evidence.

Initialization of the non-parametric model

Observations start on 01-August-2020, i.e., well before the second wave of Covid infections (see Fig. 2). In order to initialize the discrete renewal process, we can therefore reasonably assume that the number of latent infections prior to 01-August-2020 are constant, i.e., i_t,r ≡ i_0,r for t ≤ 0. We infer i_0,r from the number of PCR-positive cases I_0,r on 01-August-2020 and roughly assume an underestimation factor of three:

$${i}_{0,r} \sim \,{{\mbox{Exponential}}}\,(3{I}_{0,r})$$

(15)

The exponential prior implies a broad uncertainty and thus sufficient flexibility of the inference model. Note that we choose PCR-positive cases to initialize the number of infected because hospitalizations were very low and noisy at the start of the second wave, making incidence data in this case a stronger choice for initializing the model. Moreover, we choose the initial reproduction number to be around one, which reflects our prior believe that the epidemic was under control well before the second wave of infections:

$${R}_{0,r} \sim {{{\mbox{Normal}}}}^{+}(1.0,0.1)$$

(16)

Initialization of the parametric model

Observations start on 01-December-2020, i.e., about 1 week prior to the lockdown’s announcement and well withing the second wave of Covid-19 infections. Here, the assumption of constant i_t,r ≡ i_0,r for t≤0 as well as R_0,r ≈ 1 are not suitable. Instead, we take posterior samples from the non-parametric model, marked with an asterisk, for initialization: In particular, we take the mean over the posterior samples of the latent infections ${\langle i\rangle }_{t,r}^{* }$ and scale the timeseries with a factor ν that corresponds roughly to the posterior uncertainty of ${i}_{t,r}^{* }$. Hence, we obtain the initial number of latent infections according to:

$${i}_{t,r}=\nu \cdot {\langle i\rangle }_{t,r}^{* }\quad \,{{\mbox{for all}}}\,\,t\le 0$$

(17)

$$\nu \sim {{{\mbox{Normal}}}}^{* }(1,0.1)$$

(18)

Similarly, we initialize the effective reproduction number R_0,r by fitting a Normal distribution to the posterior samples ${R}_{0,r}^{* }$ from the non-parametric model at the initial observation, i.e. 01-December-2020:

$${R}_{0,r} \sim {{{\mbox{Normal}}}}^{+}({\mu }_{R},{\sigma }_{R})$$

(19)

$${\mu }_{R}=\,{{\mbox{mean}}}\,({R}_{0,r}^{* })$$

(20)

$${\sigma }_{R}=\,{{\mbox{std}}}\,({R}_{0,r}^{* })$$

(21)

Parametric model with multiple predictors

For the analysis in Supplementary Fig. 1 and Supplementary Table 1, we implement a parametric model with multiple predictors c. To this end, we modify Eq. (11) to Eq. (14) according to:

$${R}_{t,r}={R}_{0,r}\exp \left(\mathop{\sum}\limits_{c}{e}_{r}^{c}{X}_{t,r}^{c}\right)$$

(22)

$${e}_{r}^{c} \sim \,{{\mbox{Normal}}}\,({e}^{c},s)$$

(23)

$${e}^{c} \sim \,{{\mbox{SkewedLaplace}}}\,(\mu =0,\sigma =0.7,\alpha =0.2)$$

(24)

$$s \sim \,{{\mbox{Gamma}}}\, ({{\mbox{mean}}}=0.07,{{\mbox{SD}}}\,=0.05)$$

(25)

The reproduction number in region r at time t is a linear combination multiple data streams ${X}_{t,r}^{c}$ with an exponential link-function to ensure positivity. Each predictor is normalized such that ${X}_{t,r}^{c}$ gives the change in behaviour or mobility at time t relative to the first day, i.e. 2020-12-01, in region r. Thus, the effect sizes ${e}_{r}^{c}$ translate a relative change in the predictor c to a change in the reproduction number R_t,r. We pool effect sizes ${e}_{r}^{c}$ to reduce regional biases and obtain a national-level effect size e^c for each predictor c.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

We establish the link between survey data, mobility data, and hospitalizations via epidemic modeling, which uses the behavioural survey and mobility data as an input to capture underlying infectious activity^30,43. Specifically we extend the semi-mechanistic Bayesian model from Flaxman et al.^29,30 to jointly model the epidemic spreading within the five regions of Denmark. Where possible, we use partial pooling of parameters to share information across regions and thus reduce region specific biases. We parametrize the regional reproduction number R_t with a single predictor X_t from our survey or the mobility data, respectively, for each realization of a model:

$$\log ({R}_{t})=\log ({R}_{0})+e{X}_{t}$$

(26)

The regional reproduction number at time t derives from the initial value R₀ and the scaled predictor eX_t with a logarithmic link-function (see Methods for full details on the model).

We compare the predictive performance of each data stream using leave-one-out cross-validation (LOO). LOO works by fitting the model to the observed hospitalizations excluding a single observation and comparing the prediction of the unseen observation against the observed real-world data. Repeating this process over all observations, allows one to estimate the model performance on out-of-sample data with a theoretically principled method that accounts for uncertainties⁴⁴. In practice, this would result in an immense computational effort and therefore, we use an efficient estimation of LOO based on pareto-smoothed importance sampling⁴⁶. In order to compare the predictive performance of, say self-reported survey against mobility, we calculate the LOO score for each model parametrization and consider the difference significant if it exceeds the 95% CI.

Because we are interested in the use of behavioural data as a guide for decision-making, our inference focuses on the key period of the second wave from 1-December-2020, i.e., about 1 week before Denmark’s lockdown announcement, to 20-February-2021 when vaccinations accelerated across the country. The period captures a sharp increase and eventual decline in hospitalizations during the second wave of Denmark’s Covid-19 pandemic (see Supplementary Fig. 2). We stress that this narrow focus makes ours a proof-of-concept study. To fully understand the efficacy of survey-data, it will be important to extend models and analyses to longer periods of time—thus making it necessary to involve factors such as vaccination, new variants of concern, the opening of schools, etc.

Defining risk-taking behaviour

As a monitoring tool, we first consider self-reported survey data on the daily number of contacts, defined as close encounters with <2 meters distance for at least 15 minutes⁴⁷. The reported numbers are highly skewed, with 15.7% of all counts concentrated on zero with some reporting over 1000 contacts (see Supplementary Fig. 3). As a result, taking the mean over daily reported numbers is highly sensitive to outliers, while reporting quantile-based measures obscure most of the variation.

Instead, we define the following robust measure of risk-taking behaviour: We label a participant in the survey as risk-taking if they report contacts above a fixed threshold and propose the daily fraction of risk-taking individuals as a predictor to the effective reproduction number. The intuition is that infections tend to be linked to large clusters via super-spreading events⁴⁸. Therefore, we base our analysis on the fraction of the population that reports an above-average number of contacts.

That choice begs the question ‘What is a reasonable threshold that defines risk-taking behaviour?’ We choose a reference period prior to the lockdown’s announcement, take the distribution of contacts over the time window and define a range of thresholds in terms of percentiles (see Supplementary Fig. 3 for details). For a visual comparison, Fig. 1, second row illustrates the dynamics of risk-taking behaviour, referred to as self-reported survey data. The thresholds range from the 40th to the 90th percentile and translate into a critical number of contacts ranging from 3 and 25, respectively. For thresholds above the 60th percentile, risk-taking behaviour shows the strongest response to the announced lockdown and increases little during the Christmas period. Qualitatively, this behaviour matches the time-varying reproduction number R_t (see Fig. 1, first row) that we inferred from national hospitalizations using a latent random-walk model (details in Sec. 3).

**Fig. 1: National-level comparison between the inferred reproduction number and multiple predictors.**

In the following, we use the 70th percentile as a threshold, which corresponds to 10 close contacts and more within the past 24h. However, our results are not sensitive to this value as all models within a threshold between the 60th and 90th percentile perform similarly well (see Supplementary Fig. 4 and Supplementary Table 2).

Self-reported survey data versus mobility data

By considering self-reported survey data, we capture the sharp decline in the reproduction number after the lockdown’s announcement, i.e., about 2 weeks before its nationwide implementation. This early signal is not as pronounced in the combined mobility time series from Google and Apple that have been proposed in ref. ⁴³, nor in the telecommunication data from Danish mobile network operators (see Fig. 1 and Fig. 2 for a visual comparison on the national and regional level, respectively). In addition, we also observe a sharp increase in mobility shortly after the lockdown’s implementation, which does not correspond to the inferred reproduction number and thus does not translate into increased hospitalizations. This decoupling between mobility and disease dynamics has been previously observed for other countries^43,49. A quantitative model comparison with LOO cross-validation confirms that self-reported survey data gives the best out-of-sample predictions for hospitalizations (see Fig. 3).

**Fig. 2: Regional-level comparison between hospitalizations, reproduction number and predictors.**

**Fig. 3: Self-reported survey data (*survey*) demonstrates highest predictive performance compared to Google mobility, Apple mobility and telecommunication data (telco).**

We find a more nuanced result when comparing self-reported contacts to the individual data streams provided by Google (see Supplementary Fig. 5). In particular, the category “Retail & Recreation” performs only marginally worse (see Supplementary Table 3) suggesting that disease relevant contacts are highly context dependent—a result that we will examine in the following section.

Understanding the role of contact-types

In our survey, we assessed the daily number of contacts separately for (a) family members outside the household, (b) friends and acquaintances, (c) colleagues and (d) strangers, i.e. all other contacts. Therefore, we can evaluate the impact of social context-depending risk-taking behaviour on R_t and observed hospitalizations, respectively (Fig. 4). As above, we choose the 70th percentile as a threshold for risk-taking behaviour for each contact type, and as above our findings are robust to the specific choice of threshold.

**Fig. 4: Regional-level comparison between hospitalizations, reproduction number and risk-taking behaviour in different social contexts.**

The visual comparison in Fig. 5 shows that risk-taking behaviour towards friends, strangers and colleagues declines significantly weeks before the lockdown’s national implementation—unlike risk-taking behaviour towards family members. The latter spikes around Christmas, which appears to have little effect on the reproduction number, perhaps due to precautionary measures taken prior to visiting family (e.g., testing).

**Fig. 5: Risk-taking behaviour towards friends and strangers are the single best predictors for the observed hospitalizations.**

Cross-validation shows that risk-taking behaviour towards friends and strangers is significantly more predictive than family members and colleagues (see Fig. 5). Importantly, however, this does not imply that contacts with colleagues and family members play a minor role in disease spreading. A joint model that includes all contact types as predictors reveals a strong correlation between risk-taking behaviour towards colleagues and family members (see Supplementary Fig. 6) and Supplementary Fig. 1 and a further cross-validation analysis shows that the combination of both predictors performs similarly well to contacts with strangers and friends (see Supplementary Table 1).

Discussion

During a lockdown, decision-makers need high-fidelity, real-time information about social behavior in order to carefully calibrate restrictions to both the epidemic wave and levels of public compliance. Interventions that are too lenient will not sufficiently reduce the wave, while too severe interventions (e.g., curfews) may have significant negative externalities on, for example, public trust and mental health^4,5.

To this end, researchers and authorities worldwide have relied on mobility data, which have been cheaply available as they were already unobtrusively collected by, for example, private tech companies. At the same time, such reliance entails a dependency on data collected by company actors and data which may raise privacy issues.

In the present analysis, we have provided evidence suggesting the usefulness of daily surveys of nationally representative samples as an alternative source of information during a lockdown. While the use of surveys has been recommended during the COVID-19 pandemic by WHO¹⁰ and on the basis of the SARS epidemic in Hong Kong⁹, the present analysis provides one of the first attempts to quantify the predictive validity of surveys of self-reported behavior during a lockdown. In contrast, prior research has focused on the behavioral antecedents of behavior such as self-reported fear of COVID-19²⁵. While understanding the impact of such antecedents is a theoretically important endeavour, more direct measures of behavior may be preferable for a monitoring purpose (see also Supplementary Fig. 7 and Supplementary Table 6 for a comparison with indirect measures from our survey).

Our analyses provides a proof-of-concept that self-reported measures of behavior can be superior to mobility. Given the widespread use of mobility data it is relevant to ask why survey data fared better. Unlike the telco data and the combined timeseries from Google and Apple, respectively, the survey data was able to capture behavioural changes weeks before the lockdown’s nation-wide implementation. Parts of the effect can be explained by preceding partial lockdowns (see Supplementary Table 7 for a timeline of Covid19 related restrictions). However, we see similar decreases of activity also in regions that were not targeted with the partial lockdown and in addition, we observe an early increase in risk-awareness (see Supplementary Fig. 8). This observation hints at an additional indirect, i.e., psychological effect: Individuals adjust their behaviour in response to an increased perceived threat due to rising case numbers or intensified political discussions that culminated in the announced national lockdown on 07-December-2020. This finding suggests that part of the problem of mobility data may be that it is too coarse and, hence, does not capture the micro-adjustments in social behavior that people make when they are concerned with infection risk such as standing further away from others in public queues, not mingling with co-workers at the workplace and so forth.

Moreover, mobility increases shortly after the lockdown’s implementation with little effect on hospitalizations. This decoupling between mobility and reproduction number has been previously observed in other countries^43,49. Unlike mobility, self-reported contacts provide a more direct measure of behaviour and thus improves predictability months after the lockdown’s implementation.

At the same time, it is relevant to note that a more detailed analysis of the individual Google data streams revealed the importance of context-depending contacts: Our analysis finds that “Retail & Recreation” performs only marginally worse than self-reported contacts (see Supplementary Table 3) and Supplementary Fig. 5 and can be best explained by risk-taking behaviour towards strangers (see Supplementary Fig. 9).

Finally, we find that risk-taking behaviour towards strangers and friends provide the best predictors for hospitalizations, although, a joint model that includes contacts to colleagues and family members performs similarly well. This behaviour could be explained by their complementary dynamics during the Christmas period: Holidays implied less contacts to colleagues and larger gatherings with family members.

Our sensitivity analysis in Supplementary Figs. 10, 11, 12, 13, 14, and 15 confirms that all results are robust to minor changes in the observation window, the infection-to-hospitalization distribution, and the threshold that defines risk-taking behaviour.

Our inability to predict the rise of COVID-19 related hospitalizations prior to the lockdown’s announcement suggests that there are multiple possibilities of improving the measures used for monitoring public behavior during an epidemic. When knowledge has been gathered about the main pathways of transmission, researchers and authorities can more directly ask questions about social interactions in situations that enhances or inhibits transmission risk. During the COVID-19 pandemic, for example, it would be relevant to know whether the contact occurred inside or outside, especially as temperatures drop and individuals adjust their behaviour. Moreover, we know now about the importance of transmission in children and young adults below 18, which could not be included in the study. We believe that the lack of contextual information and representativeness limits the usefulness of our data set to predict the onset of the second wave of COVID-19 infections. (see Supplementary Fig. 2).

An important final lesson is that if one is able to sample representatively, the surveys themselves do not need to be especially large—only around 500 responses per day in this case. However, this also raises an important limitation of our study. The samples we are able to collect in Denmark are arguably too good to transfer directly to other contexts. Our ability to sample directly from highly curated central database, with a 25% response rate despite no compensation offered to respondents is not necessarily replicable in many other countries, especially outside of Europe⁵⁰. We stress that the implications of reduced sample quality need to be explored if extending our results to other contexts.

In summary, the present analysis has provided proof-of-concept regarding the usefulness of survey data as public policy tool for monitoring compliance with the announcement and implementation of lockdowns. Even though, the analyses we present are narrowly focused on a single lockdown, they provide evidence in support for the WHO’s recommendation to integrate social science methods such as surveys into pandemic surveillance and management.

Data availability

All data necessary for the replication of our results is collated in https://github.com/andreaskoher/Covid19Survey³³. This includes mobility data from Google (https://www.google.com/covid19/mobility/), Apple (https://covid19.apple.com/mobility), and Danish telco providers (https://covid19.compute.dtu.dk/data-description/telco_data/), as well as Covid-19 related hospitalizations in the five regions of Denmark (https://covid19.ssi.dk/). For convenience, we provide the regional hospitalization data together with the predictors used in the main text in Supplementary Data (see also https://doi.org/10.5281/zenodo.7818793⁵¹).

Code availability

All code necessary for the replication of our results is collated in https://github.com/andreaskoher/Covid19Survey³³.

References

Soltesz, K. et al. The effect of interventions on covid-19. Nature 588, 26–28 (2020).
Article Google Scholar
Brauner, J. M. et al. Inferring the effectiveness of government interventions against covid-19. Science 371, 9338 (2021).
Article Google Scholar
Banks, J. & Xu, X. The mental health effects of the first two months of lockdown during the covid-19 pandemic in the UK. Fisc. Stud. 41, 685–708 (2020).
Article Google Scholar
Jørgensen, F., Bor, A., Rasmussen, M. S., Lindholt, M. F. & Petersen, M. B. Pandemic fatigue fueled political discontent during the covid-19 pandemic. Proc. Natl Acad. Sci. USA 119, 2201266119 (2022).
Article Google Scholar
Clemmensen, C., Petersen, M. B. & Sørensen, T. I. Will the covid-19 pandemic worsen the obesity epidemic? Nat. Rev. Endocrinol. 16, 469–470 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fernandez-Cassi, X. et al. Wastewater monitoring outperforms case numbers as a tool to track covid-19 incidence dynamics when test positivity rates are high. Water Res. 200, 117252 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kogan, N. E. et al. An early warning approach to monitor covid-19 activity with multiple digital traces in near real time. Sci. Adv. 7, 6989 (2021).
Article Google Scholar
Benita, F. Human mobility behavior in covid-19: a systematic literature review and bibliometric analysis. Sustain. Cities Soc. 70, 102916 (2021).
Article PubMed PubMed Central Google Scholar
Leung, G. M. et al. Longitudinal assessment of community psychobehavioral responses during and after the 2003 outbreak of severe acute respiratory syndrome in hong kong. Clin. Infect. Dis. 40, 1713–1720 (2005).
Article PubMed Google Scholar
WHO. Survey Tool and Guidance: Rapid, Simple, Flexible Behavioural Insights on Covid-19: 29 July 2020. Technical Report (WHO, 2020).
Bonney, R. et al. Next steps for citizen science. Science 343, 1436–1437 (2014).
Article PubMed Google Scholar
Buckee, C. O. et al. Aggregated mobility data could help fight covid-19. Science 368, 145–146 (2020).
Article PubMed Google Scholar
Alessandretti, L. What human mobility data tell us about covid-19 spread. Nat. Rev. Phys. 4, 12–13 (2022).
Article CAS PubMed Google Scholar
Rüdiger, S. et al. Predicting the sars-cov-2 effective reproduction number using bulk contact data from mobile phones. Proc. Natl Acad. Sci. USA 118, 2026731118 (2021).
Article Google Scholar
Aktay, A. et al. Google covid-19 community mobility reports: anonymization process description (version 1.1). arXiv https://doi.org/10.48550/arXiv.2004.04145 (2020).
Hu, T. et al. Human mobility data in the covid-19 pandemic: characteristics, applications, and challenges. Int. J. Digit. Earth 14, 1126–1147 (2021).
Article Google Scholar
Jung, G., Lee, H., Kim, A. & Lee, U. Too much information: assessing privacy risks of contact trace data disclosure on people with covid-19 in south korea. Front. Public Health 8, 305 (2020).
Article PubMed PubMed Central Google Scholar
Bollyky, T.J. et al. Pandemic preparedness and covid-19: an exploratory analysis of infection and fatality rates, and contextual factors associated with preparedness in 177 countries, from jan 1, 2020, to sept 30, 2021. Lancet 399, 1489–1512 (2022)
Krosnick, J. A. Survey research. Ann. Rev. Psychol. 50, 537–567 (1999).
Article CAS Google Scholar
Schuman, H. & Presser, S. Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context (SAGE, 1996).
Larsen, M., Nyrup, J. & Petersen, M.B. Do survey estimates of the public’s compliance with covid-19 regulations suffer from social desirability bias? J. Behav. Public Adm. 3, 2 (2020).
Hansen, P.G., Larsen, E.G. & Gundersen, C.D. Reporting on one’s behavior: a survey experiment on the nonvalidity of self-reported covid-19 hygiene-relevant routine behaviors. Behav. Public Policy 1, 1–18 (2021).
Gollwitzer, A. et al. Linking self-reported social distancing to real-world behavior during the covid-19 pandemic. Soc. Psychol. Personal. Sci. 13, 656–668 (2022).
Article Google Scholar
Kalleitner, F., Schiestl, D.W., Heiler, G. Varieties of mobility measures: comparing survey and mobile phone data during the COVID-19 pandemic. Public Opin. Q https://doi.org/10.31235/osf.io/r78fk (2021).
Jirsa, V. et al. Integrating psychosocial variables and societal diversity in epidemic models for predicting covid-19 transmission dynamics. medRxiv 1, e0000098 (2020).
Apple. Mobility Trends. https://www.apple.com/covid19/mobility (2022)
Google. Community Mobility Reports. https://www.google.com/covid19/mobility/index.html?hl=en (2022).
Edsberg Møllgaard, P., Lehmann, S. & Alessandretti, L. Understanding components of mobility during the covid-19 pandemic. Philos. Trans. R. Soc. A 380, 20210118 (2022).
Article Google Scholar
Flaxman, S. et al. Estimating the effects of non-pharmaceutical interventions on covid-19 in europe. Nature 584, 257–261 (2020).
Article CAS PubMed Google Scholar
Unwin, H. J. T. et al. State-level tracking of covid-19 in the united states. Nat. Commun. 11, 1–9 (2020).
Article Google Scholar
Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: A fresh approach to numerical computing. SIAM Rev. 59, 65–98 (2017).
Article Google Scholar
Ge, H., Xu, K., Ghahramani, Z. Turing: a Language for Flexible Probabilistic Inference. http://proceedings.mlr.press/v84/ge18b.html (2018).
Koher, A., Jørgensen, F., Petersen, M. B., Lehmann, S. Monitoring public behaviour during a pandemic using surveys: proof-of-concept via epidemic modelling. GitHub https://github.com/andreaskoher/Covid19Survey. (2022).
Erikstrup, C. et al. Estimation of SARS-CoV-2 infection fatality rate by real-time antibody screening of blood donors. Clin. Infect. Dis. 72, 249–253 (2020).
Article PubMed Central Google Scholar
Faes, C. et al. Time between symptom onset, hospitalisation and recovery or death: statistical analysis of Belgian covid-19 patients. Int. J. Environ. Res. Public Health https://doi.org/10.3390/ijerph17207560 (2020).
Lauer, S. et al. The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application. Ann. Int. Med. https://doi.org/10.7326/M20-0504 (2020).
Bhatt, S. et al. Semi-mechanistic Bayesian modeling of covid-19 with renewal processes. arXiv https://doi.org/10.48550/arXiv.2012.00394 (2020).
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178, 1505–1512 (2013).
Article PubMed Google Scholar
Nouvellet, P. et al. A simple approach to measure transmissibility and forecast incidence. Epidemics 22, 29–35 (2018).
Article PubMed PubMed Central Google Scholar
Kermack, W. O. & McKendrick, A. G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. London. Series A 115, 700–721 (1927).
Article Google Scholar
Mishra, S. et al. On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective. arXiv https://doi.org/10.48550/arXiv.2006.16487 (2020)
Ferretti, L. et al. Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing. Science 368, 6936 (2020).
Article Google Scholar
Nouvellet, P. et al. Reduction in mobility and covid-19 transmission. Nat. Commun. 12, 1–9 (2021).
Article Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. Bayesian Data Analysis (Chapman & Hall/CRC, 1995).
Zhu, D. & Zinde-Walsh, V. Properties and estimation of asymmetric exponential power distribution. J. Econom. 148, 86–99 (2009).
Article Google Scholar
Vehtari, A., Gelman, A., Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Stat. Comput. https://doi.org/10.1007/s11222-016-9696-4 (2016).
Jørgensen, F., Lindholt, M.F., Bor, A., Petersen, M.B. Does face mask use elicit risk-compensation? quasi-experimental evidence from Denmark during the sars-cov-2 pandemic. Eur. J. Public Health 31, 1259–1265 (2020).
Sneppen, K., Nielsen, B., Taylor, R. & Simonsen, L. Overdispersion in covid-19 increases the effectiveness of limiting nonrepetitive contacts for transmission control. Proc. Natl Acad. Sci. USA 118, 2016623118 (2021).
Article Google Scholar
Schlosser, F. et al. Covid-19 lockdown induces disease-mitigating structural changes in mobility networks. Proc. Natl Acad. Sci. USA 117, 32883–32890 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bradley, V. C. et al. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature 600, 695–700 (2021).
Article CAS PubMed PubMed Central Google Scholar
Andreas, K. Monitoring public behavior during a pandemic using surveys: proof-of-concept via epidemic modelling. Zenodo. https://doi.org/10.5281/zenodo.7818793 (2023)

Download references

Acknowledgements

All authors are thankful to the Carlsberg Foundation who funded the study (Grant CF20-0044, HOPE: How Democracies Cope with Covid-19).

Author information

Authors and Affiliations

DTU Compute, Technical University of Denmark, Lyngby, Denmark
Andreas Koher & Sune Lehmann
Department of Political Science, Aarhus University, Aarhus, Denmark
Frederik Jørgensen & Michael Bang Petersen
Center for Social Data Science, University of Copenhagen, Copenhagen, Denmark
Sune Lehmann

Authors

Andreas Koher
View author publications
You can also search for this author in PubMed Google Scholar
Frederik Jørgensen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bang Petersen
View author publications
You can also search for this author in PubMed Google Scholar
Sune Lehmann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.K., F.J., M.B.P., and S.L. conceived the study and wrote the text. A.K. carried out modeling and analyses. F.J. and M.B.P. collected the survey data.

Corresponding author

Correspondence to Sune Lehmann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks Jon Green and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Description of Additional Supplementary Files

Supplementary Information

Supplementary Data

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Koher, A., Jørgensen, F., Petersen, M.B. et al. Epidemic modelling of monitoring public behavior using surveys during pandemic-induced lockdowns. Commun Med 3, 80 (2023). https://doi.org/10.1038/s43856-023-00310-z

Download citation

Received: 17 January 2023
Accepted: 25 May 2023
Published: 08 June 2023
DOI: https://doi.org/10.1038/s43856-023-00310-z

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain language summary

Similar content being viewed by others

Introduction

Methods

Data

Model description

Observation model

Infection model

Transmission model

Initialization of the non-parametric model

Initialization of the parametric model

Parametric model with multiple predictors

Reporting summary

Results

Defining risk-taking behaviour

Self-reported survey data versus mobility data

Understanding the role of contact-types

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links