COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak

This N = 173,426 social science dataset was collected through the collaborative COVIDiSTRESS Global Survey – an open science effort to improve understanding of the human experiences of the 2020 COVID-19 pandemic between 30th March and 30th May, 2020. The dataset allows a cross-cultural study of psychological and behavioural responses to the Coronavirus pandemic and associated government measures like cancellation of public functions and stay at home orders implemented in many countries. The dataset contains demographic background variables as well as measures of Asian Disease Problem, perceived stress (PSS-10), availability of social provisions (SPS-10), trust in various authorities, trust in governmental measures to contain the virus (OECD trust), personality traits (BFF-15), information behaviours, agreement with the level of government intervention, and compliance with preventive measures, along with a rich pool of exploratory variables and written experiences. A global consortium from 39 countries and regions worked together to build and translate a survey with variables of shared interests, and recruited participants in 47 languages and dialects. Raw plus cleaned data and dynamic visualizations are available. Measurement(s) psychological measurement • anxiety-related behavior trait • Stress • response to • Isolation • loneliness measurement • Emotional Distress Technology Type(s) Survey Factor Type(s) geographic location • language • age of participant • responses to the Coronavirus pandemic Sample Characteristic - Organism Homo sapiens Sample Characteristic - Location global Measurement(s) psychological measurement • anxiety-related behavior trait • Stress • response to • Isolation • loneliness measurement • Emotional Distress Technology Type(s) Survey Factor Type(s) geographic location • language • age of participant • responses to the Coronavirus pandemic Sample Characteristic - Organism Homo sapiens Sample Characteristic - Location global Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13251776


Background & Summary
In 2020, a new coronavirus pandemic spread across countries worldwide. This resulted not only in a global health crisis, but also in severe economic and socio-psychological consequences. To control the spread of the coronavirus, governments imposed a range of measures, including the closure of schools, workplaces, shopping areas and public amenities, forced isolation, virus-testing, and limits to civil liberties. Inevitably, these changes generated a variety of psychological responses in individuals, which in turn shaped the level of compliance with preventive measures. In fact, extant research on the factors that shape willingness to comply with public health efforts aimed at preventing or slowing the spread of epidemics has highlighted the importance of psychological and social factors 1,2 -for instance shared trust in state or health authorities 3,4 -in driving compliance with guidelines and restrictions. The implications of these complex factors to compliance with preventive measures imposed by different governments must be analysed in detail after the crisis. Indeed, the psychological and societal effects are likely to be more pronounced, more widespread, and longer-lasting than the purely somatic effects of the infection 5 .
To contribute to the understanding of the intersection between pandemic-related physical and behavioural issues, the present document describes a large-scale dataset collected through the collaborative COVIDiSTRESS global survey. The COVIDiSTRESS data collection efforts ran from 30th March to 30th May, 2020 by collaborators from 39 countries and regions with survey forms available in 47 languages and dialects. In total, 173,426 participants were recruited from 179 countries on six continents. www.nature.com/scientificdata www.nature.com/scientificdata/ Pandemic outbreaks breed misinformation, and foster fear of contagion as well as uncertainty during the course of their spread 5,6 . Factors such as concerns regarding the severity of a disease, the perceived reliability of government information, and beliefs in the efficacy of preventive measures can influence individuals' intentions to comply and engage in preventive behaviours 7 . Thus, the extent of compliance is influenced by the level of trust in one's sources of information about a pandemic, as well as the perceived gravity of the disease. Concerns over one's risk of contracting the disease during a pandemic can be a source of ongoing worry and anxiety as well as stress (e.g. H1N1 7 and MERS 8 ). These concerns, as well as the confusion generated by the lack of established worldwide or national quarantine protocols, timely information and resources from public health systems 9 may contribute to lower levels of compliance. Research indicates that the perception of openness and reliability of governments and health organisations 10 , levels of trust in media and medical authorities 11,12 as well as perceptions  Table 1. Sample size, proportions of valid data, age mean and standard deviation across countries with more than 200 participants. Note. N = number of participants; Prop = proportion. Prop_50 = proportion of participants that have more than 50% of non-missing data. Prop_90 = proportion of participants that have more than 90% of non-missing data. M_age = mean age; SD_age = standard deviation of age.
www.nature.com/scientificdata www.nature.com/scientificdata/ of disease's severity and the efficacy of one's actions 10,13,14 contribute to compliance with recommendations for preventive behaviour.
Both the medical situation and the psychological effects of isolation, confinement and information behavior 15,16 need to be considered when prolonged periods of quarantine are implemented. A subset of negative effects on 'cabin fever' includes responses varying from anxiety and depression 17 to impaired cognitive ability and hostility 16,18 . Efforts such as closing down schools and workplaces, and calls for people to self-isolate in their homes, are likely to constitute a source of both existential and practical stress unrelated to the fear of contracting the disease. Compliance with medical guidelines has been shown to decrease not just as a result of higher stress levels 19 , but also of minor everyday stressors such as workplace conflict or household responsibilities 20 . Prolonged states of emergency and the chronic psychological, social, and economic stressors related to them 21,22 may decrease compliance with set behavioural objectives during pandemics. Conversely, social support from groups such as one's family, friends, and colleagues moderate the effect of concern for the disease or other sources of stress on one's psychological well-being 23,24 .
Hence, as an effort to help health authorities and decision makers organize informed responses, we initiated the COVIDiSTRESS open science collaboration. The dataset can help researchers and stakeholders identify nuances in psychological and behavioural risk factors in the context of the COVID-19 pandemic, and assist governments and other organizations in adopting constructive policies appropriate to each country.

Methods
Participants. 173,426 people accessed an online survey link to provide their experiences over a period of 62 days (30th March to 30th May. The stored dataset represents 125,306 people who met inclusion criteria (18 years of age and older and gave informed consent). Demographic characteristics for countries with over 200 responses appear in Table 1. Given the urgent call for COVID-19 research, the survey received a waiver to commence data collection from the IRB office at Aarhus University, Denmark. Participants volunteered based on online and media appeals without monetary compensation; excepting some of the Japanese participants received 7 T-points (equivalent to about 0.065 USD) from the crowdsourcing service as a reward.

Materials.
The full survey form in English can be accessed at https://doi.org/10.17605/OSF.IO/Z39US. The survey consisted of two parts. The first section comprised general demographic data, self-reports about the proximate effects of the COVID-19 pandemic (e.g. isolation status, first-hand experience, attenuated risk), modified version of the Asian Disease problem to examine participants' risk taking intention under COVID-19 situation 25 , personality assessment (BFI-S 26 ), Short self-report scale of loneliness 27 (SLON-3) based on the UCLA loneliness scale, Perceived Stress Scale (PSS-10 28 ), self-reports about the interpersonal and institutional trust (based on OECD guidelines 2017), and items measuring daily behaviours including compliance with general and social preventive measures. The second part contained sets of more specific items related to people's experiences of distress and worry during the ongoing outbreak of coronavirus (e.g. access to amenities, loss of work, adapting work, education and social interactions to digital platforms, the social stresses of confinement with adults and children), as well as items which detected copying mechanisms of people during the COVID-19 crisis (e.g. social contact, staying informed, dedicating oneself to preparation, hobbies, religion) and the Social Provisions Scale (SPS-10 29 ). Finally, participants were asked to report information behaviours in times of the coronavirus pandemic, and were invited to add a few lines of text, to illuminate their experience of the COVID-19 crisis beyond the closed-end items. Participants typically supplied their answers on a 6-point Likert scale ranging from 'Strongly disagree' to 'Strongly agree' , with some variation based on established standards, as well as in text boxes to add other relevant factors. Validated short versions of established measures were used if available in local languages. The full list of variables included in the COVIDiSTRESS global survey as well as the response options participants used to answer the survey are available at https://osf.io/v68t9/. To protect participants' data and avoid sensitive    www.nature.com/scientificdata www.nature.com/scientificdata/ available online at the Open Science Framework (COVIDiSTRESS global survey 30 ) and in supplementary information. The corrections made were: • Filtered out cases without consent and younger than 18 years old.
• User Language -Bulgarian (BG): For responses between 2020-03-28 13:30:02 UTC and 2020-04-08 01:53: 18 UTC, the order of the variable Country was mixed up for people who took the survey in Bulgarian language. Thus, the data was recoded.  Table 4. Proportion of marital status across countries with more than 200 participants. Note. Prop_ single = proportion of participants who are single. Prop_married/cohabiting = proportion of participants who are married or cohabiting. Prop_divorced/widowed = proportion of participants who are divorced or widowed. Prop_marital_other/not_say = proportion of participants who live in some other form of community or don't want to state their marital status. Prop_marital_NA = proportion of missing data for the marital status variable.
www.nature.com/scientificdata www.nature.com/scientificdata/ • User Language -Afrikaans (AFR): For responses before 2020-04-07 06:48:00, the order of the variable Country was mixed up for people who took the survey in Afrikaans language. Thus, the data was recoded. • User Language -Hebrew (HE): The variable Country was translated and arranged according to the Hebrew alphabetical order. Thus, the data was recoded. • User Language -Bengali (BAN): Variables Scale_PSS10_UCLA_6 and Scale_PSS10_UCLA_7 were swapped during translation, so they were swapped back in the data cleaning procedure. • Country: Removed dashes in front of the '-other' responses in Country.
• Start Date: Cases before the official launch date 2020-03-30 were excluded as they were test answers. Soft launch answers from Denmark and Kosovo before the start date were retained.  Table 5. Proportion of current risk of infection across countries with more than 200 participants. Note. Prop_ yes = proportion of participants whose own or family members are at high risk, Prop_not_sure = proportion of participants who are not sure, Prop_no = proportion of participants whose own or family members are not at high risk. Prop_NA = proportion of missing data for the risk variable.
www.nature.com/scientificdata www.nature.com/scientificdata/ • Marital Status: Except for the original English version of the survey, the order of the Dem_maritalstatus variable was mixed up in translations. The variable was recoded to correct this problem. There were some participants who had '5' in Dem_maritalstatus. These responses were recoded as 'Uninformative response' . • Education level and mother's education level: Removed dashes in front of the response options. There were some participants who had '1' in Dem_edu. These responses were recoded as 'Uninformative response' . • Gender: The variable Dem_gender was inverted for languages SSP (Spanish -Spain) and SME (Spanish -Mexico) in the raw data file. Thus, in these responses, Male was recorded to Female and vice versa.   Table 6. Proportion of current isolation status across countries with more than 200 participants. Note. Prop_ usual = proportion of participants whose life carries on as usual. Prop_minor = proportion of participants whose life carries on with minor changes. Prop_medical = proportion of participants who are isolated in medical facility or similar location. Prop_isolated = proportion of participants who are isolated. Prop_ NA = proportion of missing data for the isolation variable.
• From 15th May onwards, additional items (Q50-Q62) were included for a location-specific sub-study on war trauma in Bosnia/Herzegovina. These were not part of our pre-registration. These columns were cleaned (see below), but not included in the current report: • Renamed new columns for clarity (Q50-Q62): born_92, experience_war, experience_war_TXT, war_injury, loss_during_war, time_spent_in_war, time_spent_in_war_TXT, Scale_UCLA_TRI_1:4 (4 items), PS_ PTSD_1:5 (5 items) • War-related questions: Removed numbers, periods, and extra spaces in the responses for the experience_war, war_injury, loss_during_war, time_spent_in_war (i.e. "2. Yes" got simplified to "Yes") • TRI_4: Responses were converted from choice text to numeric and composite score for the scale was calculated • PS-PTSD: Responses were converted from choice text to numeric Note that correcting the error-coded variables (Gender, User Language Bulgarian, Afrikaans and Hebrew, Marital Status) is necessary for correct interpretation of the data. None of the other actions described above   Fig. 3 The distribution of gender across ten countries with the largest samples (missing data were excluded from this depiction due to very low proportions).  Fig. 4 The distribution of education across ten countries with the largest samples (missing data were excluded from this depiction due to very low proportions).  www.nature.com/scientificdata www.nature.com/scientificdata/ (e.g., recoding text into numerical values) affect the data interpretation in any way. Apart from filtering out test data (data before the official launch on 2020-03-30) and participants who declared that they are younger than 18, all data was retained. When recoding, all groups present in the raw data file were also preserved. For more details, please see the data cleaning R markdown file. Thereafter, the text description is based on the cleaned data.  Table 8. Descriptive statistics for the Asian Disease Problem across countries with more than 200 participants. Note. N = number of participants Prop_nonmis = proportion of participants that responded to Asian Disease Problem. Prop_gain = proportion of participants assigned to the gain condition among those responded to Asian Disease Problem. Prop_program_A = proportion of participants who selected Program A among those assigned to the gain condition. Prop_program_B = proportion of participants who selected Program B among those assigned to the gain condition. Prop_loss = proportion of participants assigned to the loss condition among those responded to Asian Disease Problem. Prop_program_C = proportion of participants who selected Program C among those assigned to the loss condition. Prop_program_C = proportion of participants who selected Program D among those assigned to the loss condition.

Data Records
Raw data and code for cleaning is available at https://doi.org/10.17605/OSF.IO/Z39US 30 . Figure 1 shows a heat map of the countries from which the data were collected, coloured according to the sample size (n ≥ 200). The main characteristics of the survey are presented in Tables 1 to 6. Information on the basics (Table 1), gender (Table 2), education (Table 3), marital status (Table 4), current risk of infection (Table 5), and current isolation status (Table 6) for countries with their sample size of more than 200 are presented, respectively.

Data visualization interface.
In addition to the raw data, a dedicated Web application was developed to provide a general overview of the COVIDiSTRESS dataset (https://covidistress.france-bioinformatique.fr/). The Web application allows easy and dynamic generation of illustrations like age pyramids, zoomable world maps, and bar plots summarizing the main variables of the survey for each selected country. Two tabs of visualizations   www.nature.com/scientificdata www.nature.com/scientificdata/ are provided: the first contains basic demographic variables like age, gender, and educational level by country; the second tab displays world maps of levels of stress, trust in institutions and concerns for self, friends, family, country, and other countries. The application is based on an R shiny server (https://rstudio.com/products/shiny/ shiny-server/), together with the plot.ly 31 and ggplot2 32 graphical libraries to generate dynamic plots. All the generated figures can be exported as PNG files.

technical Validation
As of 30th May, the participants in our data represented 176 different countries. However, there were instances in which we only had one participant per country (i.e. The Bahamas, Uganda, etc.). For computational purposes, we decided to examine the data quality for 42 countries that had over 200 participants.   www.nature.com/scientificdata www.nature.com/scientificdata/ Overall, 25 of these 42 countries had more than 1,000 participants. Among these, Finland, France, and Denmark are the three countries with the highest numbers of respondents (over 10,000). At least 62% of the participants provided answers to half of the questions in the survey, and at least 47% responded to 90% of the questions. We added one variable, "answered_all, " that indicates whether a participant answered all questions for users' information. Of all 125,360 participants included in the cleaned dataset, 42.48% answered all questions. Figure 2 demonstrates the proportion of valid data across 10 countries with the highest number of participants (top 10 countries). The mean age of participants (M = 39.22, SD = 14.09) falls between young-to mid-adulthood, and in most countries, the number of female participants is disproportionately higher. Figure 3 illustrates the distribution of gender in the top 10 countries. Similarly, our sample seems to disproportionately represent people with some levels of higher education (i.e. some college or higher). Figure 4 shows participants' levels of education in the top 10 countries. Additional details on the sample characteristics (including age, gender, education   www.nature.com/scientificdata www.nature.com/scientificdata/ level, and marital status) can be found in Table 1 through Table 4. The dataset also includes answers to questions related to the respondent's current likelihood of infection (e.g. risk of infection with COVID-19 in the family and the degree of isolation), as shown in Tables 5 and 6. Given our narrow timeline and the convenience sampling method, we acknowledge that our samples may not be representative of the populations of interest. However, we believe that the data can still be meaningfully used to understand the experiences of certain groups of people during this pandemic.
Aside from some specific questions on COVID-19 (i.e. self-protective behaviours, trust in the government's agencies, etc.), our data includes several scales that were previously validated within certain populations, including the Asian Disease Problem, PSS-10, SPS-10, BFF-15 (BFI-S), and the SLON-3. Figure 5 illustrates Cronbach's alphas for these scales in the top 10 countries. In Table 7, we presented several descriptive statistics of each of the aforementioned continuous scales. Below, we described the preliminary statistics of the scales for all 42 countries.   www.nature.com/scientificdata www.nature.com/scientificdata/ asian disease problem. The basic descriptive statistics of the Asian Disease Problem are summarized in Table 8. Specifically, among the 42 countries, at least 91% of the participants responded to this problem. They were randomly assigned to either of the gain or loss condition. Among those who responded, 50.27% were assigned to the gain condition, while 49.73% to the loss condition. Participants in the gain condition selected one of two options, Program A vs. B. Program A was selected by 66.20% of the participants in the gain condition, while 33.80% selected Program B. Those in the loss condition selected one of two options, Program C vs. D. Program C was selected by 36.54% of the participants in the loss condition, while 63.46% selected Program D.

PSS-10.
The basic descriptive statistics of the PSS-10 are summarized in Table 9. Specifically, among the 42 countries, at least 75% of the participants rated this scale. The composite scale score ranges from 1 to 5, with a www.nature.com/scientificdata www.nature.com/scientificdata/ mean value falling between 2.30 and 3.13. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.66 to 0.90.

SPS-10.
The basic descriptive statistics of the SPS-10 are summarized in Table 10. Specifically, among the 42 countries, at least half of the participants rated this scale. The composite scale score ranges from 1 to 6, with a mean value falling between 3.55 and 5.20. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.88 to 0.94.

SLON-3.
The basic descriptive statistics of the SLON-3 are summarized in Table 11. Specifically, among the 42 countries, at least 77% of the participants rated this scale. The composite scale score ranges from 1 to 5, with a  www.nature.com/scientificdata www.nature.com/scientificdata/ mean value falling between 1.89 and 3.05. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.54 to 0.84.

BFF-15.
This term was used for this project. This is more commonly known as the Big Five Inventory-SOEP (BFI-S).
Extraversion. The basic descriptive statistics of this subscale are summarized in Table 12. Specifically, among the 42 countries, at least 71% of participants rated this scale. The composite subscale score ranges from 1 to 6, with a mean value falling between 3.12 to 4.50. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.51 to 0.86.  www.nature.com/scientificdata www.nature.com/scientificdata/ Neuroticism. The basic descriptive statistics of this subscale are summarized in Table 13. Specifically, among the 42 countries, at least 70% of the participants rated this scale. The composite subscale score ranges from 1 to 6, with a mean value falling between 2.91 and 3.80. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.44 to 0.77.
Openness. The basic descriptive statistics of this subscale are summarized in Table 14. Specifically, among the 42 countries, at least 71% of the participants rated this scale. The composite subscale score ranges from 1 to 6, with a mean value falling between 3.36 and 4.97. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.46 to 0.74.  www.nature.com/scientificdata www.nature.com/scientificdata/ Agreeableness. The basic descriptive statistics of this subscale are summarized in Table 15. Specifically, among the 42 countries, at least 71% of participants rated this scale. The composite subscale score ranges from 1 to 6, with a mean value falling between 3.62 and 4.85. The internal consistency of the scale, as measured by Cronbach's alpha, ranges from 0.30 to 0.67.
Conscientiousness. The basic descriptive statistics of this subscale are summarized in Table 16. Specifically, among the 42 countries, at least 70% of participants rated this scale. The composite subscale score ranges from 1 to 6, with a mean value falling between 3.54 and 5.01. The internal consistency of the scale, as measured by Cronbach's alpha, ranges 0.34 to 0.67.

Usage Notes
We recommend that any interested researchers use the raw or the cleaned version of the latest extracted data (available at https://doi.org/10.17605/OSF.IO/Z39US). The data was imported and cleaned using the R software for statistical analysis 33 and packages tidyverse 34 , multicon 35 , qualtRics 36 , pacman 37 , and psych 38 . Before using the dataset, the steps in the Data cleaning section should be followed to ensure that the dataset is ready for analysis. The data cleaning procedure should involve excluding irrelevant cases, correcting some errors in value-coding, and renaming improperly named variables. In addition, the cleaning procedure should encompass recoding choice values to number, creating composite scores, and the estimation of the Cronbach alpha reliabilities for the measured scales (PSS-10, BFF-15, SPS-10, and SLON-3). However, for analysis in individual countries, we recommend checking for tau-equivalence before using Cronbach's alpha for reliability estimation. If tau-equivalence is not achieved, Omega coefficient is more appropriate as a reliability indicator 39,40 . Before analysing the data, it should be noted that the answers in variables measuring distress ('Expl_Distress_no') are recoded to numeric values 1, 2, 3, 4, 5, and 6, measuring the degree of agreement, and 99, which means that the item does not apply to one's current situation. Additionally, answers in the variable 'Trust_countrymeasure' are recoded on a scale from 0 to 10, where 0 and 10 suggest inappropriate measures (too little or too much) and values around 5 suggest appropriate measures.
To merge the present dataset with a pre-existing cross-cultural dataset by country and date, the variables 'Country' and 'RecordedDate' should be used.
Finally, the samples in the present dataset are not representative of the populations from which they are drawn (in each country). Thus, users who wish to address this issue may weigh the data by referring to demographic information for each country and apply the appropriate weights for the variables and countries of interest (e.g., age: http://data.un.org/Data.aspx?d=POP&f=tableCode%3A22; gender: https://ourworldindata.org/ gender-ratio; education: https://ourworldindata.org/global-education; marital status: https://ourworldindata. org/marriages-and-divorces).