Global evidence of expressed sentiment alterations during the COVID-19 pandemic

The COVID-19 pandemic has created unprecedented burdens on people’s physical health and subjective well-being. While countries worldwide have developed platforms to track the evolution of COVID-19 infections and deaths, frequent global measurements of affective states to gauge the emotional impacts of pandemic and related policy interventions remain scarce. Using 654 million geotagged social media posts in over 100 countries, covering 74% of world population, coupled with state-of-the-art natural language processing techniques, we develop a global dataset of expressed sentiment indices to track national- and subnational-level affective states on a daily basis. We present two motivating applications using data from the first wave of COVID-19 (from 1 January to 31 May 2020). First, using regression discontinuity design, we provide consistent evidence that COVID-19 outbreaks caused steep declines in expressed sentiment globally, followed by asymmetric, slower recoveries. Second, applying synthetic control methods, we find moderate to no effects of lockdown policies on expressed sentiment, with large heterogeneity across countries. This study shows how social media data, when coupled with machine learning techniques, can provide real-time measurements of affective states. Using tweets in over 100 countries, Wang et al. examine evidence of global sentiment during the COVID-19 pandemic. They find that COVID-19 outbreaks caused a decline in sentiment worldwide, and the effects of lockdowns differed across countries.

T he December 2019 coronavirus disease (COVID-19) outbreak has threatened the stability of health-care systems and generated unparalleled social and economic disruptions in nations across the globe 1 . The health crisis has brought emotional distress to citizens beyond those contracting the disease 2 . Over the course of 2020, individuals faced novel health risks associated with daily activities, shortages of resources and increased uncertainty in their financial futures and social lives 3 . In addition, numerous governments have imposed strict controls on movement, infringing on personal freedoms and increasing loneliness, depression, anxiety and other negative emotions 4,5 .
Many governments worldwide are incorporating measures of citizens' subjective well-being into policy decision-making to complement economic indicators such as gross domestic product [6][7][8] . Affective state (positive and negative emotions) is one of the central components of subjective well-being 9,10 and is likely to be greatly impacted by the COVID-19 pandemic 11 . As the COVID-19 crisis continues and the world faces the expectation of recursive virus outbreaks, governments worldwide are increasingly concerned about the emotional impacts of COVID-19 outbreaks and anti-contagion policies used to manage the pandemic 12 . Nonetheless, while there have been numerous efforts to track COVID-19 infection and policy responses globally 13,14 , there are no standardized high-frequency measures of the affective aspect of subjective well-being.
Tracking the affective states of citizens during disruptive events such as natural disasters and epidemics is especially challenging due to the unpredictability and volatility of these crises 15 . There are a variety of laudable survey initiatives, such as the Weekly COVID-19 Snapshot Monitoring in Germany 16 , to track the evolution of risk perception using cross-sectional national surveys. However, such traditional survey measurements are prohibitively expensive at the global scale and usually suffer from limited coverage, insufficient sampling frequency and substantial delays 17 . Social media data have offered a valuable complement to track the affective aspect of subjective well-being 9 . Sentiment analysis, which uses natural language processing (NLP) and computational linguistics, allows standardized quantification of emotional states from text 18 . Expressed sentiment indices built on people's posts on social media platforms have been validated to be meaningfully correlated with subjective well-being measured by conventional surveys (for example, the Gallup-Sharecare Well-Being Index survey, which is currently the most definitive measurement of subjective well-being and is widely used in well-being research) 9 . Recent literature has applied social media expressed sentiment indices to estimate the effects of temperatures 19,20 , local air pollution 21 , natural disasters 22,23 and other environmental stressors 24 on subjective well-being.
Here we build a global dataset that tracks expressed sentiment at the national and subnational (state/province) levels with high temporal and spatial granularity using anonymized and aggregated data from the two largest social media microblogging platforms (Twitter and Weibo (the Chinese equivalent of Twitter)). The data contain more than 600 million geotagged social media posts on all topics published by 10.56 million individuals during the first wave of the COVID-19 pandemic (from 1 January to 31 May 2020) (Fig. 1). Since the sample of COVID-19-related discussions might not be a good representation of the affective state of the general population and could be polluted by political campaigns, we exclude tweets directly related to COVID-19 when building out the main sentiment indices. We then apply the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) 25 NLP technique to compute daily sentiment measures in over 100 countries standardized across 65 languages (Methods). Unlike dictionary-based sentiment analysis such as Linguistic Inquiry and Word Count (LIWC) 26 , deep-learning-based BERT algorithms allow word representations to be enriched with contextual information and enable multilingual computations 27 .
On the basis of our measures of expressed sentiment and under the assumption that the existing evidence on the correlation between sentiment and affective well-being is valid, we conduct two inter-related empirical exercises to evaluate the global affective impacts of the COVID-19 pandemic and policy responses. The first exercise estimates the overall expressed sentiment alterations associated with COVID-19. We employ reduced-form econometric methods to measure the sentiment drops related to the advent of COVID-19 human-to-human transmission and estimate the recovery time needed for sentiment to return to the baseline levels. Our second exercise applies synthetic control methods (SCM) to explore how social-media-based sentiment measures can be used by countries and international organizations to evaluate alterations in affective states after policy interventions or events, using lockdown policies as an example. To facilitate comparisons across countries, our estimates of sentiment alteration are all measured in the unit of a country's own magnitude of sentiment variation (that is, the standard deviation of sentiment time series before COVID-19). We describe our approach in more detail in the Methods.

results
Expressed sentiment alterations during the COVID-19 pandemic. The advent of COVID-19 was followed by a sizable drop in global expressed sentiment (Fig. 1b), especially after the World Health Organization (WHO) declared COVID-19 a global pandemic on 11 March 2020. Figure 2a highlights the universality of the sentiment change associated with the COVID-19 pandemic: all countries in our sample sequentially suffered sentiment alterations around the beginning of the pandemic, with varying magnitudes and durations. Sentiment gradually recovered after the shock, showing a similar trend with survey measurements of risk perception (for example, COVID-19 Snapshot Monitoring conducted in Germany 28 ). To measure the patterns of sentiment alterations created by COVID-19, we develop two global indices (Methods, 'Modelling of sentiment dynamics'): sentiment drop and recovery half-life.
We define the sentiment drop as a country's sentiment decline from the level before COVID-19 to its lowest value during the first wave of COVID-19. To estimate it, we separately fitted a sentiment trend before the date sentiment started to decline and after the date it reached its lowest value using local linear regressions; we then applied regression discontinuity design (RDD) to quantify the gap (see Methods, 'Sentiment drops' , for the details). RDD is a quasi-experimental design commonly adopted to measure the impacts of abrupt and exogenous events [29][30][31] , which allows us to separate the structural shock from daily fluctuations in sentiment. We measured the magnitude of a country's sentiment drop relative to the standard deviation of the country's sentiment before COVID-19 (that is, before the detected date of sentiment decline) for comparability across countries. We find that the sentiment impact of the COVID-19 pandemic is negative for all countries, with the average drop equivalent to 0.85 s.d. (P < 0.001; 95% confidence interval (CI), (0.60, 1.10); Fig. 2b). The sentiment changes are statistically significant at the 5% level for the vast majority (91.5%) of the countries and present large heterogeneity across countries (see Supplementary  Table 5 and Supplementary Fig. 12 for the country-specific results). The largest sentiment drops took place in Australia (coefficient = −3.308; P < 0.001; 95% CI, (−3.656, −2.960)), Spain (coefficient = −2.927; P < 0.001; 95% CI, (−3.204, −2.650)), the United Kingdom (coefficient = −2.354; P < 0.001; 95% CI, (−2.521, −2.186)) and Colombia (coefficient = −2.112; P < 0.001; 95% CI, (−2.326, −1.899)), while Botswana, Tunisia, Oman, Bahrain and Greece had effect sizes smaller than −0.15 s.d.
To contextualize our results, we first examined the average sentiment variations over the course of a week before COVID-19. We find that people have a higher expressed sentiment on weekends ( Supplementary Fig. 13). The average difference in sentiment between Sunday and Monday (that is, the unhappiest day) was 0.18 s.d. across countries, which has a similar magnitude with findings in previous work 19 . The effect size of COVID-19 (that is, 0.85 s.d. across the globe) is more than 4.7 times as large as this weekly sentiment drop from Sunday to Monday. In addition, according to a previous study 19   . This suggests that the acute impact of COVID-19 on sentiment is potentially more pronounced than that of extreme hot temperatures and climate disasters.
Besides the onset of sentiment drop, we estimate how long it took for people's expressed sentiment to recover. Our second index, sentiment recovery half-life, measures the days it took for a country to recover from the lowest sentiment to half of its stationary state of recovered sentiment (that is, the convergence value in the calibrated sentiment recovery model). It is important to mention that the recovery time not only reflects the emotional resilience towards the pandemic itself. This measure should be interpreted as a combined effect of pandemic severity and regulatory policies, and it may be influenced by other events happening around the first wave of the pandemic within each country. Following best practices proposed by previous studies 32 , we characterize the sentiment recovery process of each country in our sample as an exponential function starting from its minimum sentiment using equation (3) (Methods, 'Sentiment recovery' , and Supplementary Fig. 10). The estimated indices show that the recovery half-life varies substantially across countries (Fig. 2c), ranging from 1.2 days (Israel) to 29.0 days (Turkey). Meanwhile, the new stationary state of recovered sentiment until 31 May 2020 also varies across countries: as shown in Fig. 2c, 18% of countries had sentiment recovered to a lower level (below −1.00 s.d.), 35% of countries recovered to the normal value (between −1.00 s.d. and 1.00 s.d.) and 46% of countries recovered to a higher level (above 1.00 s.d.). These results suggest a longer-term alteration in expressed sentiment in countries that show large discrepancies between their recovered sentiment and their average sentiment before COVID-19. Finally, we explore how the magnitude of sentiment alterations by country (sentiment drop and recovery half-life) correlates with countries' pandemic severity, governance and cultural traits (see Methods, 'Data' , for the full list of variables collected for testing). These results are intended to motivate exploration for future studies, as our country-level correlation analysis cannot pin down   Fig. 8a; Pearson correlation, ρ = 0.250; P = 0.007; 95% CI, (0.070, 0.414)). Moreover, we find that the governance efficiency index from the World Bank, a comprehensive measure of public sectors' performance proven to predict a country's capacity to control the COVID-19 pandemic 33 , is positively correlated with fast recovery (Supplementary Fig. 8b; ρ = −0.259; P = 0.011; 95% CI, (−0.437, −0.061)). Beyond objective characteristics, cultures usually play an important role in how people perceive and react to collective threats. Previous studies have shown how nations with loose cultures (that is, having lenient norms and punishments for deviance) had more difficulty coordinating in the face of the pandemic 33,34 . Consistently, we find a positive correlation between cultural looseness and sentiment drops (Supplementary Fig. 8c; ρ = 0.447; P = 0.001; 95% CI, (0.187, 0.649)). We also conduct correlation tests for other dimensions (such as a country's development stage, health security and other cultural constructs from previous studies 35 ), and none of them pass the 5% significance threshold after family-wise adjustment for multiple hypothesis testing (Supplementary Table 2).

Impacts of lockdowns on expressed sentiment.
Given the absence of a vaccine during the first wave of the pandemic, many governments implemented a series of non-pharmaceutical interventions to contain the spread of the virus, with lockdowns being the most stringent ones. Lockdowns aim at minimizing physical contact among citizens, which deprives individuals of their freedom to undertake a wide range of daily activities and creates financial risks linked to job loss. Nevertheless, lockdowns could also generate a sense of security regarding virus control and curb public concern about the pandemic 4 . Given these circumstances, the direction and magnitude of sentiment change after lockdowns are likely to be context-specific, depending on the timing of implementation and public attitudes towards the policy. The critical empirical challenge is that governments tend to impose lockdown measures in response to uncontrolled virus surges, challenging the construction of proper comparisons for the lockdown countries. Researchers can easily fall into the trap of comparing countries severely struck by COVID-19 and having a worsening sentiment trend with countries in a better situation, thus leading to false conclusions that lockdown itself worsens sentiment. Here we apply SCM 36 to construct suitable comparisons for each lockdown country. SCM allows comparisons of a treated country's sentiment after lockdown with the weighted average sentiment constructed from a pool of control countries that have no or late lockdown. Weights are assigned according to the similarity of the control countries with the lockdown countries of interest in pre-lockdown sentiment, pandemic severity and development indicators (Methods, 'Impacts of lockdowns on expressed sentiment').
We find that, on average, lockdown policies are followed by a small and positive sentiment change when comparing the average sentiment change across all locked-down countries with that of their synthetic controls in the first week of their implementation (Fig. 3a). Of the 52 countries that have over 500 daily geotagged social media posts, that implemented nationwide stay-at-home orders and for which we can construct valid synthetic controls, 34 (65%) show a positive sentiment impact of lockdown policy, and 18 (35%) display a negative effect ( Fig. 3b; the country-specific results of lockdowns are summarized in Supplementary Tables 7 and 8). The sentiment change is rather subtle compared with the reduction in mobility in the first week after lockdown policies are implemented, estimated using the same empirical strategy (Supplementary Fig. 15a). In contrast, as expected, we find overwhelmingly (89% of countries) negative effects of lockdown on mobility in the first week post-lockdown ( Supplementary Fig. 15b), suggesting that our method is effective in picking up the changes following lockdown.
Although the sentiment change after lockdown is small in magnitude for most countries, we do see notable dispersion in effect size, ranging from −1 s.d. to +1.2 s.d. Statistical inferences constructed through permutation tests also show countries having both significantly positive and negative effects (Supplementary Note 5). We find suggestive evidence that for countries having significantly negative sentiment change post-lockdown ( Supplementary Fig. 4a), the negative effect is concentrated in the unhappiest (the bottom sentiment quartile) social media posts within a country, compared with the happiest ones (the top sentiment quartile) ( Supplementary  Fig. 4b,c). These results suggest that lockdown policies could have disproportionate emotional impacts on the unhappiest people. Due to the macro nature of this study, the distributional effects will need future research to validate.

Discussion
Timely monitoring of the affective aspect of subjective well-being is essential for public policy design and management 7 . Survey methods usually have limited samples within developing economies and require considerable time to execute, leading to a lack of generalizability and time delay when faced with catastrophes. Using high-frequency social media data and state-of-the-art NLP algorithms, we construct a comprehensive database of expressed sentiment covering over 100 countries worldwide (74% of the world population). Our method applies a state-of-the-art sentiment metric using lexical expressions of social media data to measure the changes in emotional states, which is validated to correlate with traditional survey measures of subjective well-being 9,15,37,38 (see Supplementary Note 2 for expanded discussions on this topic).
Leveraging this database, we provide empirical evidence at the global scale of the alterations in expressed sentiment associated with COVID-19. We find a remarkable consistency in the way COVID-19 induced sentiment alterations across countries. Though taking place at different time points, almost all countries showed an abrupt and statistically significant sentiment decline around the beginning of the COVID-19 pandemic, followed by an asymmetric and slower recovery. Despite the similarity in the shapes of sentiment response curves, sentiment drops were larger in countries having more confirmed COVID-19 cases or looser cultures, while the recovery was faster for countries with efficient governments.
We also display how this global sentiment database can be used to model sentiment changes after lockdown policies. Though severe emotional costs of lockdown policies are widely assumed, we found little evidence supporting this hypothesis (at least in the short term), when comparing countries that had implemented lockdown policies with their synthetic controls. This seemingly surprising result does not indicate that the social and financial risks created by lockdowns are trivial; instead, it suggests that for countries with severe pandemic situations, letting the virus spread without imposing stringent anti-contagion policies would lead to similar or even larger emotional distress. Several previous studies have also documented complex emotional responses towards lockdown, and studies showing a negative association between lockdown and sentiment usually have not removed the impacts of the pandemic itself from lockdown measurements (see Supplementary Note 7 for a literature summary). Our analysis implies that lockdown policies do not necessarily entail a trade-off between physical health and emotional well-being-at least not for the average population of a country. It is worth noting that COVID-19 policy is not a clean setting for causal identification, since the anticipation effect before lockdown interventions and the spillover effects in sentiment across treated and control countries could bias our estimates. In addition, we do see substantial dispersion in sentiment change after lockdown across countries. Understanding the specific, contextual factors that produce these variations in effect sizes is an important avenue for future work.
Social media sentiment analysis provides complementary merits to those of survey measures for subjective well-being surveillance, but it has several limitations. First, the internet and social media penetration rates vary across countries and across different income and age groups within countries. Our analysis can only be used to understand the patterns of those who use Twitter or Weibo to communicate and lacks explanatory power for the least developed regions and elderly populations. Second, although social media expressed sentiment correlates with the affective aspects of subjective well-being, it cannot reliably measure the life satisfaction dimension of subjective well-being 39,40 . Due to the limitations in representativeness and measurement, social media sentiment analysis should serve as a complement rather than a substitute for self-reported measures of subjective well-being. And more research is needed to understand the relationship between NLP-based sentiment and survey-based well-being in developing countries. Third, as sentiment analysis using digital trace is still a nascent research area, we do not have enough evidence to judge whether our expressed sentiment measurements can be used to diagnose clinically meaningful mental disorders. More psychometric validations with self-reported mental health status will be required to understand to what extent expressed sentiment on social media can be used in psychiatric epidemiology. Fourth, our study mainly focuses on sentiment changes for the average population of social media users within a country and how country-level characteristics and policies moderate the effect. While these measures are meaningful at a macro level to understand global heterogeneity, we cannot measure the moderating effects of individual-level socio-demographics, beliefs and preferences, which limits our capacity to speak to disparities and potential tailored interventions for particular population subgroups. The technological progress in demographic inference tools based on social media data 41 could enable further heterogeneity analysis at the individual or subgroup level, which could be an important research direction for future studies. Finally, our study covers only the first wave of the COVID-19 pandemic, and there are countries still recovering from their first waves. Rather than directly extrapolating our empirical results to inform future pandemic strategies, we recommend careful evaluations using our method and extended datasets for future waves.
Our data and methodology intend to provide a useful tool for tracking emotional well-being. This tool can support timely monitoring and decision-making by international and national policymakers. methods Data. Social media data. We collected social media data from two large microblogging platforms, Twitter and Weibo (the Chinese equivalent of Twitter). The data cover the five months from 1 January 2020 (when COVID-19 spread was essentially restricted to China's Wuhan region) to 31 May 2020 (when most countries had recovered from their first COVID-19 wave). Only geolocated social media posts, for which users consented to share their location information, are of interest to our analysis. In the study period, 654 million geotagged Twitter and Weibo posts were collected globally (Fig. 1a).
Twitter is a global platform where users share content, or 'tweets' , with their followers. As of 2019, Twitter had 330 million monthly active users. Users can give consent to share their location information by enabling background GPS collection or by tagging a location in their tweet. These geolocated tweets are encoded with latitude and longitude coordinates. We employed reverse-geocoding techniques to extract country information from geolocations. Twitter changed its geotagging approach in 2019 to enhance privacy protection 42 . However, because our analysis is conducted at the national or subnational level, this change in approach does not affect the state/country assignment of the individual. As Twitter's counterpart in China, Weibo (Sina microblog) is one of the top social networking platforms, with 462 million monthly active users in 2020. The geotagged Weibo posts are a subsample of all Weibo posts for which users consented to share their location information. This location information is based on the user's exact latitude and longitude when releasing the Weibo post from a smartphone or a computer. We filtered out the institutional accounts (including big 'Vs' (the most influential celebrities) in Weibo) and used only the individual's original posts.
To measure people's general emotional well-being rather than specific emotions towards COVID-19 itself, we excluded all COVID-19-related posts on the basis of an exhaustive list of COVID-19-related terms (Supplementary Fig. 19). During the implementation, we translated the COVID-19-related terms into the 30 most common languages to account for multilingual content. Posts where any one of these text patterns matched with the content were flagged as COVID-19 related.
Lockdown policy data. We collected and evaluated country-level lockdown policy data from two sources. The first was the Oxford Coronavirus Government Response Tracker (OxCGRT) 14 , and the second was the WHO Public Health and Social Measures (PHSM, https://www.who.int/emergencies/diseases/ novel-coronavirus-2019/phsm). The former records 17 different government responses, including 8 related to containment and movement restrictions. Each government response is coded for response stringency (response-specific) and scope ('targeted' versus 'general'). The latter joins seven policy databases together (including the OxCGRT) to provide a common taxonomy and comprehensive policy outlook at the national and subnational levels.
We consider the WHO PHSM to be comprehensive of all COVID-19 policy announcements because this dataset aggregates the efforts of seven separate COVID-19 policy databases. However, upon manual review of this dataset, we decided that its encoding of policy announcements into stringencies and measures was not sufficiently accurate for use as our final start and end order database. Instead, we used the WHO PHSM dataset to cross-validate the OxCGRT dataset by comparing all OxCGRT policy start dates, levels of enforcement and scopes with the WHO PHSM dataset's announcements. When there was a discrepancy between the two datasets, we updated the OxCGRT dataset with the manually reviewed announcement from the WHO PHSM dataset.
Epidemiological data. The COVID-19 epidemiological data were collected from the COVID-19 dashboard by the Center for Systems Science and Engineering at Johns Hopkins University (https://coronavirus.jhu.edu/map.html).
Human mobility data. In 2020, Google provided a comprehensive COVID-19 Community Mobility Report (https://www.google.com/covid19/mobility/) showing the changes in mobility patterns in 135 countries from 15 February to 27 July 2020. The report displays the relative strengths of mobility indices in six different aspects (retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, and residential) based on the baseline values, which are calculated by taking the median value of visits and length of stay in each type of place during five weeks before the COVID-19 pandemic.
Country-level indices. We collected a rich set of country-level indices to assist with the heterogeneity analysis of sentiment drop globally. The base years are different for each index due to differences in the most recent data availability. First, we collected a set of development indices. We collected gross domestic product per capita (2018), urbanization rate (2019) and unemployment rate (2019) from the World Bank; these are the commonly used indicators to represent a country's development status 43 . We also collected the 2017 Socio-demographic Index, a comparative summary socio-demographic metric synthesizing per capita income, educational attainment and fertility rate 44 , which is commonly used to explain the disparities in countries' burdens of disease 45 . Second, we collected two indices related to a country's management capacity: government efficiency index (2018) from the World Bank and the 2015 Global Health Security Index (https://www.ghsindex.org/). Government efficiency is a comprehensive measure of public sectors' performance and has been proven to predict a country's capacity to control the COVID-19 pandemic 46 . The Global Health Security Index is the first comprehensive assessment of health security at the global scale 47 . Finally, we collected culture-related indices from Awad et al. 35 . The key culture variable of interest is cultural tightness, which measures the tightness of social norms and was found to explain a country's capacity to manage the COVID-19 pandemic 46 . The cultural indices in this dataset also include individualism, religiousness and relational mobility (that is, the fluidity with which people can develop new relationships).

Sentiment analysis.
We employed NLP sentiment imputation algorithms to analyse the million daily social media posts that make up our dataset. Different types of NLP methods have been used for sentiment classification of textual data in the existing literature. Dictionary-based approaches match the words that make up each text entry to sentiment-specific lists (or dictionaries). LIWC 26 and the Hedonometer project 48 offer two such dictionaries and have often been used in social media research 49 . More recently, sentiment classification of text data has been successfully implemented using neural networks such as transformers or convolutional neural networks 50 . These methods create high-dimensional representations of the text entries, usually based on pre-trained word vectors. For this study, we used a transformer that has achieved state-of-the-art results in text classification: BERT 25 . Unlike traditional word2vec text representation models, BERT creates dynamic word representations informed locally by the neighbouring context. Our global study uses a pre-trained Multilingual BERT model, which creates representations broadly consistent across different languages 27 . Sentence-BERT provides an additional document-level embedding. On the basis of Siamese-coupled neural networks, this model produces semantically meaningful representations of sentences that can be compared among themselves in the embedding space 51 . For our study, we created these high-dimensional representations for every social media entry in our dataset.
We trained a simple logistic-regression classifier on the first 100 principal component analysis dimensions of the Sentence-BERT social media post embeddings. The training data we used are a set of 1,600,000 tweets labelled as positive or negative 52 . Since representations are consistent across languages, we were able to train our sentiment classifier in English and predict sentiment in the 104 languages supported by Multilingual BERT 27 , which covers 65 identifiable languages in Twitter and Weibo. We evaluate the performance of this model in Supplementary Note 1 and find a classification accuracy of 0.84 for English content and 0.75 on average in other languages (see the details in Supplementary  Table 1). We further compared the sentiment from our BERT-based algorithm with sentiment indices from the dictionary-based LIWC method using English tweets, and the results show high consistency ( Supplementary Fig. 1). To enhance the transparency of our algorithm, we display how people changed their use of emotional words (defined by the LIWC English dictionary) accompanied with the decline in our sentiment index at the onset of the COVID-19 pandemic in Supplementary Note 3.
We averaged each social media post sentiment score daily at the national and subnational levels (for example, state or province; the largest subnational administrative unit of a country). To avoid oversampling individuals who post the most on social media, we first aggregated our sentiment data to the individual-date level and then averaged the individual sentiments on each day to the subnational (state/province) or national level. Moreover, we used the one-class classification approach 53 to detect and exclude Twitter bots.

Sentiment alterations during the COVID-19 pandemic.
Modelling of sentiment dynamics. To measure the ability of sentiment recovery and each country's recovery status, we adopted the following procedures: (1) Select countries. We only kept countries in which tweets were generated by more than 100 active users a day from 1 January to 31 May 2020, to ensure that extreme observations and insufficient data do not severely impact a country's sentiment index. (2) Detrend. To extract the overall sentiment trend out of daily sentiment fluctuations, we implemented seasonal trend decomposition using a locally estimated scatterplot smoothing algorithm 54 with seven days as a feeding parameter for temporal cycles to remove seven-day periodical patterns from the raw data and average the fluctuations (Supplementary Fig. 9). The seven-day period captures weekly circular patterns and is further confirmed when applying the Fourier transformation algorithm to detect temporal cycles of sentiment. Sentiment drops. To estimate the effect of COVID-19 on sentiment drop, we exploited the 'donut regression discontinuity' design 56 at the timing threshold of min-date, t * , by removing the confounding days between drop-date and min-date and measuring the sentiment discontinuity from the level before drop-date to that after min-date ( Supplementary Fig. 11). We use time (daily) as the running variable, and the estimate for sentiment shock (τ RD ) we use is: In the equation, E is the expectation value. This RDD is equivalent to segmented regressions used for interrupted time series analysis in public health research 57,58 . This quasi-experimental design is particularly useful in cases where an abrupt event causes all units to be treated, and there are potential confounding time trends pre-and post-treatment. To calculate the sentiment limit on both sides of the running variable, we used local linear regression to fit the sentiment curve on each side of the threshold t * . Local linear instead of higher-order polynomial fit is recommended by previous literature 59 . Fitting the general trends pre-and post-interruption can prevent one day with extreme sentiment values to substantially bias the magnitude of sentiment drop. In practice, we estimated the equation for each country separately with administration (province or state level) sentiment time series as input: In the equation, y sit is the average sentiment index for province/state s in country i on date t, COVID it is a binary variable equal to 1 when the time is after the country's sentiment nadir and 0 otherwise, and rel_date sit is the date measured in days from the minimum sentiment date t * . The terms γ 1 rel_date sit and γ 2 rel_date sit × COVID it absorb the smooth relationship of sentiment trend within the bandwidth surrounding t * . We used a bandwidth of 28 days on each side and the triangular kernel as our base specification. As robustness checks, we also tested the results for the uniform kernel as well as for different bandwidths, which yields similar results (Supplementary Table 6 and Supplementary Fig. 7). In all cases, we weighed the regression by the number of tweets so that larger provinces/states have higher weights. We added day-of-week fixed effects (δ DOW ) and state fixed effects (η s ) to control for weekly cyclical and time-invariant state-specific confounding factors; ε sit is the error term. β is our coefficient of interest, the magnitude of which is divided by the standard deviation of sentiment pre-COVID-19 (that is, before the sentiment drop-date of each country) to make the results more comparable across countries. The standard errors are clustered within province/state to account for sentiment correlation.
Sentiment recovery. To characterize the recovery of each country, we established two indices: recovery half-life and recovery status. Recovery half-life represents a country's recovery speed, while recovery status represents to what degree a country's sentiment had recovered at the end of May 2020. Following Fan et al. 32 , we parametrized the sentiment recovery process through an exponential model and estimated the parameters u, v and γ with nonlinear least squares. More specifically, we regress the daily sentiment value f(x) on x, which captures the number of days since the country achieved the minimum level of sentiment using the following exponential function: To ensure that the sentiment we measured was free from impacts of the Black Lives Matter campaigns, we set 25 May 2020 (that is, when the George Floyd event took place) as the end date of our sentiment analysis. We removed all countries with abnormal sentiment fluctuations on which the parameter calibration algorithm could not converge within 1,000 steps during the fitting process to ensure quality.
We then identified the two recovery indices using the fitted exponential model. To find the recovery half-life, we searched for the date on the fitted curve where the sentiment recovered 50% of the distance between a country's sentiment nadir and its final sentiment on 25 May. To further understand the recovery status, we compared a country's sentiment status on 25 May with the baseline level before the sentiment drop-date and defined it with pre-COVID-19 sentiment standard deviation as a unit to ensure comparability across countries (for example, −1 indicates recovering to a status 1 s.d. below the baseline sentiment).
Impacts of lockdowns on expressed sentiment. When defining lockdowns, we refer to the "stay-at-home requirements" policy category of OxCGRT 60 . A country is defined as a lockdown country if it has national-level requirements on not leaving the house except essential trips (that is, levels 2 and 3 of the C6 policy category in OxCGRT). Our cross-validation process was summarized in the 'Data' section. The lockdown dates compare with the sentiment drop, and min dates are summarized in Supplementary Fig. 14.
Here we applied SCM 36 to estimate sentiment alterations after lockdown interventions for each country separately. In the case that no single country alone provided a good comparison for the lockdown country of interest (that is, violating the parallel trend assumptions required for difference-in-differences or event studies), we constructed a combination of non-lockdown countries as a synthetic control group to best resemble the characteristics of the treated country before its lockdown. We then compared the sentiment of a treated country on days after the lockdown with the weighted average sentiment of the control countries at the same period to estimate the treatment effect. The weights assigned to each control country were calculated such that the simulated synthetic control best resembles the treated country of interest in the pre-lockdown period. Mathematically, the distance between the vector of pre-specified characteristics of the treated country and that of the weighted average controls is minimized before the lockdown time 36,61 .
The validity of synthetic control relies heavily on the countries included in the 'donor pool' and the observable characteristics that the weights of synthetic control are built on. We allowed the 'late adopters' of lockdown policies to serve as controls for the 'early adopters' to increase the similarity between the treatment country and its donor pool. This approach does not change the results for late adopters and those early adopters for which appropriate comparisons can be constructed from the never adopters (for example, northern European countries rely heavily on Sweden as control); but it can provide a better simulation of the comparison scenario of early adopters with no comparable countries that never implemented lockdown policies. We only explored one-week post-lockdown in the SCM analysis so that late adopters implemented more than seven days after a specific treated country could be matched to enhance comparability between a treatment country and its synthetic control country. Meanwhile, we excluded countries with many subnational-level lockdowns, including the United States, China, Nigeria, Brazil and Germany. We also included four layers of socio-economic and pandemic-related variables in the covariates to ensure that higher weights were assigned to similar countries. Details about the variables included as covariates and the associated robustness checks are presented in Supplementary Note 4.
For the identification of SCM to be causal, we need to assume that the choice of which unit will be treated is random conditional on the choice of the donor pool, the observable variables included as predictors and the unobserved factors that can be captured by the pre-treatment path of the outcome variable 62 . It is hard to test this assumption directly, and COVID-19 policies might generate anticipation effects and spillover effects for some countries; care is thus warranted for interpreting these estimates as causal.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data used in this paper are available at https://github.com/Jianghao/ Sentiment_COVID-19.

Code availability
The code used in this paper is available at https://github.com/Jianghao/ Sentiment_COVID-19.