## Main

Non-pharmaceutical interventions in response to COVID-19 often depend for their effectiveness on the behavioural responses of the public. Even with a vaccine, uptake is not entirely in the control of experts and policy-makers. Rather than being a small factor, there is growing evidence that the prevention behaviours of people are dramatically influenced by many social and cultural factors1,2,3. Analyses of mobility data reveal that the movements of people are predicted and perhaps caused by their partisan affiliation4, media consumption5 and the behaviours of their social networks6. Thus, the epidemiological and economic effects of policies that close (or open) businesses and schools are substantially determined by people’s beliefs. This is consistent with the recognition, at least among public health experts, that health communication is a core part of effective response to epidemics, ideally in concert with other policies and interventions. However, developing and deploying effective policies and communication strategies demands data about people’s beliefs and how they have been affected by prior exposure to information from governments, peers and media—and these data are largely lacking, even as massive troves of medical and behavioural traces are used by researchers3.

This motivated us to conduct a large-scale, international survey related to COVID-19 in 67 countries (Fig. 1 maps the countries included) to help policy-makers and researchers better monitor and understand people’s knowledge, beliefs, behaviours, norms and risk perceptions across the world through a collaboration with Facebook and Johns Hopkins University and with input from experts at the World Health Organization and the Global Outbreak Alert and Response Network. The survey is organized into blocks on the basis of the question topics. Every survey begins with questions from the same five blocks: information exposure, knowledge, vaccine and healthcare and demographics. In ‘snapshot‘ countries, all respondents are shown an information block and then three additional blocks that are randomly selected from the remaining blocks. In ‘multiwave‘ countries, respondents are shown four randomly selected blocks. Precise questions and the codebook for the data can be found in the Supplementary Information. In constructing the survey instrument, we drew on input from a wide set of domain experts. The survey consisted of questions related to COVID-19 information exposure and trust in information sources, knowledge about the virus, community norms, prevention behaviours, beliefs about efficacy of measures, vaccine acceptance, risk perceptions and locus of control in addition to demographics. The survey data include weights that use the rich information Facebook has about its users to reduce bias from non-response and differential Facebook use among different subpopulations. This resource article presents the survey dataset and some example use cases of the data.

## Results

We now provide some basic results about the survey sampling and weighting as well as assorted analyses using data from some of the modules of the survey, including vaccine acceptance over time, mismatch in COVID-19 perceptions and consumption and trust of various news sources. These are some examples of possible uses of the data. In the Discussion, we show some other examples from other papers using the same data and point to directions for future research using the data.

### Characteristics of the sample

Figure 2 shows the sample size we obtained per country and the effective sample size (as measured by Supplementary Information equation (A.2)). Although, on average, we obtain 3,000 users per week, the effective sample size varies widely, Bangladesh being the lowest with an average of only 791 users. Supplementary Information Tables A2 and A3 show the unweighted and weighted demographics of our sample, respectively. Supplementary Information Table A1 shows the two most popular languages used, by country.

Next, we plot the (inverse) conversion rate to the survey (how many users saw our survey prompt on their homepage) versus how many clicked and completed our survey. We can see from Fig. 3 that we needed, on average, 260 impressions for a single response. This is in line with the conversion for previous research using Facebook ads for surveys7. For most countries with good Facebook penetration (for example, in Europe), this number is around 50. For some countries (for example, Nigeria and India), the number was at least an order of magnitude higher. This may reflect various differences, including perceived and actual costs of mobile data that would be used when completing the survey. Our survey weights are designed to reduce these biases in sampling.

### Vaccine acceptance over time

We look at vaccine hesitancy and its trends over time. First, we computed the fraction of respondents who say that they would take a vaccine or have taken the vaccine (starting July 2020). Figure 4 shows the trends for the 23 wave countries over the duration of the survey (July 2020–March 2021). We observe a few clear trends. There is huge heterogeneity across countries, with Vietnam having a consistent vaccine acceptance of over 80% throughout the time period and countries like the United States and Poland experiencing an initial dip but improving in terms of acceptance later in the months before mass rollout of vaccines. Egypt, which would not see vaccines rolled out at scale for another 6 months, had a steady decline in vaccine acceptance during the same period8. On average, across the 23 wave countries, vaccine acceptance has varied in the range of 57% to 71% with slight improvements since late 2020. We notice these improvements across many countries where vaccines were being slowly rolled out, although making a causal connection between vaccine rollout and vaccine acceptance is beyond the scope of this paper.

Starting in wave 9 (end of October 2020), we also asked the following question about perceived vaccine norms: ‘Out of 100 people in your community, how many do you think would take a COVID-19 vaccine if it were made available?’. The question helps us gauge perceptions of vaccine acceptance in the community. It is interesting to note that there is a significant difference between individual beliefs (‘acceptance’) and beliefs about others (‘norms’). There is at least a 10% gap between them consistently. Respondents think that at least an additional 10% of the population would not take the vaccine.

Figure 5 shows the proportion of responses to the vaccine acceptance question for the four countries. The figure shows the importance of the ‘Don’t know’ response or people who are yet undecided on the vaccine. Consider the case of the United States, where the proportion of unsure users declined over time, while the proportion of users saying they would take a vaccine increased. Similarly, in Egypt, the proportion of users who oppose the vaccine as well as those who are unsure has increased in the last few months which is a good case study for policy intervention. Overall, on average, across the 23 countries in our dataset, vaccine acceptance varied between 59% and 72% between July 2020 to March 2021. The proportion of users who are not sure has ranged between 13% and 18% during that same period.

Next, we plotted the correlation between acceptance and norms for the 44 snapshot countries (for wave 9, in which the norms question was asked). We observe similar trends in Fig. 6. In all the 44 countries, respondents think others are much less likely to get the vaccine than they are themselves. We highlighted four countries to indicate the heterogeneity across countries. Note that self-reported intentions to vaccinate might differ from actual vaccine uptake9.

### Mismatch in COVID-19 perceptions

We asked two questions about the perception of seriousness of COVID-19 and perceptions among the community: community_action_importance—‘How important is it for you to take actions to prevent the spread of COVID-19 in your community?’ (possible answers—extremely important, very important, moderately important, slightly important, not important at all); and community_action_norms—‘How important do other people in your community think it is to take actions to prevent the spread of COVID-19?’. If respondents themselves think taking action against COVID-19 to be extremely important, but think others do not take it seriously (or vice versa), they might adapt their behaviour to take steps that would not be necessary. Figure 7 shows the mismatch in beliefs for two countries: the United States and Japan. The figure shows a heat map of the mismatch. The plots are normalized by row (one’s own beliefs) and each cell indicates the conditional probability of beliefs about others (columns) given one’s own beliefs (rows sum to 1). We see that there is a clear difference in the distributions across the United States and Japan, with most people in Japan having a congruent view, compared to the wide range of disagreement in the United States. The two countries were chosen to show an example of how divergent the beliefs about others could be in different cultures.

### News sources and medium: consumption versus trust

Finally, we asked for the sources/ mediums users consumed COVID-19 related information from and their trust in these sources (pages 20 and 21 in the Supplementary Information list these survey questions). Figure 8 shows the trends for consumption and trust for five sources: online news, radio, television, local health workers and politicians. In a pandemic, it is important to have widely trusted sources provide information that is widely consumed10,11,12. However, as we can see from the figure, most sources do not satisfy this criterion. Some interesting trends emerge: politicians are the least trusted, and in most countries the least used, source of information. Television has high consumption but trust in television varies widely among the countries in our sample. Local health workers are typically well trusted but they are not a source of information for most countries.

## Discussion

The paper describes a global, longitudinal survey on COVID-19 behaviours, beliefs and norms. We present three examples of potential use cases for the dataset: (1) vaccine acceptance and norms, (2) mismatch between own beliefs and beliefs about others and (3) trust in versus consumption of various news sources. Some of the trends observed here, particularly at a global scale and including countries in the global south, are valuable for understanding behavioural and social drivers of vaccination13 and would not have been made available to the research community otherwise. Identifying what people think and feel and the social processes, such as norms14, that influence their thinking will help researchers identify motivations behind critical health behaviours. Such a strategy is, for instance, extensively used by WHO for measuring behavioural and social drivers of vaccine hesitancy15. Overall, this paper provides a valuable resource which should serve as a foundation for future research and give rise to new questions in understanding the COVID-19 pandemic and developing policy solutions around it. For instance, our findings on heterogeneity in vaccine trends across countries (Fig. 4) or the mismatch in perceptions across countries (Fig. 7) are new and may not be explained by existing literature. Combining our data with historic and cultural trends could help identify new insights on the role of country-specific variables in explaining the results16,17. Some of the temporal variations in vaccine acceptance (for example, in countries such as the United States, Poland and Egypt, highlighted in Fig. 4) remain unexplained and open venues for future research into factors behind vaccine acceptance trends.

Our survey data can directly inform policies on the national and global stage. For example, others in their study of political messaging and attitudes towards vaccination in Latin America18 use our surveys to assess the relationship between vaccine acceptance, political vaccination campaigns and political trust. Another study of our survey responses for South Asian countries identified gender, age, knowing someone who tested positive for COVID-19 and perceived effectiveness of mask wearing as significant determinants of COVID-19 vaccine hesitancy, arguing for targeted vaccine education and communication campaigns19. Others20 analysed responses among ten snapshot countries in sub-Saharan Africa in the two survey rounds that happened in July and November 2020 (Fig. 1). They use the ‘yes’ and ‘no’ answers to the survey question about handwashing in the past week as their primary outcome. Using a multivariate logistic regression, they identify the main determinants of handwashing that are classified sociodemographic (age, gender, education and rural or urban residence) and ideational (perceived personal health, beliefs about handwashing, knowing someone diagnosed with COVID-19 and perceived norms), adjusting for country-level fixed effects. The authors document clear regional and country-level variations in handwashing, pointing to settings with the greatest opportunity for improvement. Similarly, the significant country-level heterogeneity of our survey measures and, in particular, the vaccine trends, have served as motivation or explanatory factors in other research studies that target local populations; for example, in Spain21 or Australia22.

Several other studies have used COVID-19 beliefs, behaviours and norms survey data to analyse risk perception, attitudes towards mask wearing and other preventive behaviours, as well as trust in information sources across communities worldwide. A previous study23 uses the survey data to identify significant predictors of risk perception in older adults and its association with their preventive behaviours and medical avoidance. They find accurate knowledge to be a crucial factor in disentangling this association. Joining the survey data with COVID-19 cases and death counts worldwide, another study24 shows that mask wearing and attitudes towards masks are associated with fewer cases and deaths across different countries, controlling for socioeconomic factors such as population density, human development and mobility. Another analysis17 of the survey data reveals that mask usage is higher in countries with more collectivistic (versus individualistic) cultures after controlling for a host of variables such as COVID-19 severity, government policy, population density, GDP per capita and demographics. Others25 analyse our survey responses to construct various measures of vaccine intention, perceived invincibility and prosocial concerns at the individual level and study their relationships, controlling for perceived personal health and demographic attributes measured in the survey, as well as estimates of country-level cultural collectivism from other studies. They show that perceived invincibility has an overall negative effect on both prosocial concerns and vaccine intentions. These effects are particularly pronounced in counties with low cultural collectivism and shown to be robust across age cohort and gender. This ability to investigate individual health-related behaviours by controlling for country-level variables, such as cultural collectivism, shows the unique contribution of the present resource to the research community. Such investigations would not have been possible without the global COVID-19 beliefs, behaviours and norms survey data.

Yet another study26 uses the randomized order of the survey questions to show that highlighting accurate information about vaccine norms increases vaccine acceptance. Several layers of randomization throughout the survey provide a ripe ground to explore priming, anchoring and information treatment effects on different demographics in a representative global sample (for example, respondents are randomized to see questions about their risk perception and perceived control over health outcomes which affects their answers to follow-up questions about their adherence to preventive measures in ways that can inform public health communication). The longitudinal data are collected over a period of global pandemic emergency that coincided with high-profile events, providing natural experimental opportunities on national and international scales (for example, the US presidential election, epidemic peaks and emergency use approvals of vaccines in different countries). In addition to in-depth demographic, psychographic and sociometric measurements of health-related behaviours as well as media and news consumption (some of which were show-cased in the Results), the survey resource also has questions about work and travel (full survey instrument given in Supplementary Information D). We expect the confluence of these factors will open new areas of enquiry in public health, communication and economic policy and we are optimistic that future researchers will leverage these large-scale, rich survey data on beliefs, behaviours and norms during the COVID-19 pandemic in innovative ways.

## Methods

The survey’s purpose was to guide policy and research around individual responses to COVID-19 beyond symptoms and the most closely associated behaviours. The Committee on the Use of Humans as Experimental Subjects at the Massachusetts Institute of Technology (MIT) approved the survey as exempt (project no. E-2294) and informed consent was obtained from all participants. The survey ran from July 2020 until March 2021. It was translated into 51 languages and fielded in 67 countries, yielding over 2 million responses. The full survey instrument is provided in Supplementary Information D. The survey data dictionary is provided in Supplementary Information B and the log of changes to the survey over the course of its duration are provided in Supplementary Information C.

### Survey instrument design

There were multiple goals for this survey and associated topics for each goal that formed individual modules of questions. The users of this survey include academic researchers, governments and non-governmental organizations. As the pandemic was occurring during the lifetime of this survey, one of our main goals was to provide ongoing tracking of key measures of knowledge about COVID-19 and how to prevent its spread, which can inform targeting and evaluation of public health campaigns. For researchers, the goal behind the survey was to provide them with a rich dataset spanning multiple countries to conduct more in-depth research. We gave examples of research papers applying this dataset in the Discussion.

More specifically, we wanted to provide data to help achieve the following goals:

• Understand which preventive behaviours are most/least understood and practiced by region/country and how this changes over time

• Identify countries/regions with low knowledge of given preventive behaviours and understand how and why this differs from adjacent countries/regions

• Identify differences in self-reported preventive behaviours associated with differences in psychosocial behavioural determinants

• Identify countries/regions with the biggest gap between knowledge and practices and understand how and why this differs from adjacent countries/regions

• Understand how COVID-19 related policies impact knowledge, attitudes and behaviours by geography

These survey goals led us to build different modules within the survey including (see Supplementary Information D for the full survey instrument):

• Basic demographics and localization

• Current behaviours for prevention

• Exposure to various sources of information

### Sampling and weighting

The survey was fielded in two different ways. First, in countries with a sufficient pool of Facebook users to sample, we fielded a multiwave survey that ran continuously in multiple 2-week waves from July 2020 until March 2021. In each wave, Facebook aimed to deliver 3,000 respondents to our survey. In countries with a more limited survey pool, we fielded a snapshot survey where Facebook aimed to deliver 3,000 respondents over a 2-week period; this was done twice, first in July 2020 and then in November 2020. The list of countries is selected on the basis of survey viability (which is determined by the population of Facebook users in that country), regional representation and feedback from survey partners at the World Health Organization and the Global Outbreak Alert and Response Network. See Fig. 1 for a map showing the countries.

The Facebook team uses non-response modelling and poststratification techniques from survey statistics to design the following components27,28:

1. (1)

Sampling—deciding who to present with the invitation to participate in the survey

2. (2)

Weighting—providing a weight per user so that respondents better represent the target population as a whole

The MIT team supplied binary survey completion flags (binary indicators of whether or not each respondent has completed the survey) along with a respondent identifier (a random number associated with each survey respondent) back to the Facebook team. No other data about individual respondents were sent by MIT to Facebook. We provide the completion flags for the following two analytical samples:

1. (1)

Respondents who have completed the basic knowledge and demographics parts of the survey. This part consists of a briefing followed by questions about information exposure, availability of treatments and vaccines and contact with healthcare workers, as well as gender, age, education, overall health, country and, in the case of the United States and India, state as well. We call this the ‘demographic completion type’.

2. (2)

Respondents who have reached the end of the entire survey, viewing (and typically answering) additional questions about information sources; information needs; their knowledge about high-risk populations, methods of transmission and disease symptoms; norms and beliefs about distancing, mask wearing and other preventive measures; risk perception and locus of control; work, travel and intentions to visit various locations, followed by a debrief. We call this the ‘full survey completion type’, although note that there can still be missing data due to non-response to individual questions and random assignment to different survey blocks.

Subsequently, the Facebook team computed and returned sets of survey weights to the MIT team, one set for each analytical sample. No other data about respondents were sent by Facebook to MIT besides a respondent identifier (a random number associated with each survey respondent), their language preference, these survey weights and an indicator of whether these survey weights were clipped (Supplementary Information A). The weights are meant to be used in Háajek estimators (normalized importance sampling estimators) for measuring population means. Specifically, let Yi be an outcome variable of interest measured for the respondent i whose weight is wi. The Háajek estimator, $$\hat Y$$, for the population mean of the outcome, $$\bar Y$$, is given by:

$$\hat Y = \frac{{\mathop {\sum}\nolimits_{i = 1}^n {w_iY_i} }}{{\mathop {\sum}\nolimits_{i = 1}^n {w_i} }},$$
(1)

This is the default in most statistical software for computing a weighted mean. Subsequently, if interested in population totals, analysts should use $$N\hat Y$$ as an estimator of the total outcome level where N is the population size. That is, analysts should not use the weights in an unnormalized way, as in a Horvitz–Thompson estimator (an unnormalized importance sampling estimator), as, while the weights are approximately on the level of each country’s adult population, the clipping and other adjustments to the weights make them unsuitable for direct estimation of total outcome levels without normalization. More generally, users can use these weights in other related estimators that appropriately normalize the weights29. Survey weights are critical to maintaining statistical representativeness and especially important for large samples9. Supplementary Information A includes a detailed description of the survey weights design and various consistency checks for representativeness of the weighted survey sample.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.