Social markers of a pandemic: modeling the association between cultural norms and COVID-19 spread data

While cross-national differences of the epidemic curves of COVID-19 become evident, social markers of such variability are still unexplored. In order to investigate how certain social norms may underlie the heterogeneity of the spread of infections, global social data (including cultural values, indices of prosperity, and government effectiveness) and covariates (such as climate zone, economic indicator, and healthcare access and quality) of early transmission dynamics of COVID-19 were collected. Model-based clustering and random forest regression analysis were applied to identify distinct groups of societies and explore predictors of COVID-19 doubling time. Clustering revealed four groups: (1) reserved; (2) drifting; (3) assertive; and (4) compliant societies. Compliant societies from dry climate zones showed the highest doubling times in spite of increased population densities. Most relevant predictors of doubling time were population density, freedom of assembly and association, and agency, underlining the importance of social factors in the hetereogeneity of COVID-19 transmission rates. Our cluster typology might contribute to the explanation of cross-national variability in early transmission dynamics of highly infectious diseases.


Introduction
T o date, the novel coronavirus, SARS-CoV-2 caused more than a million confirmed infection  worldwide (Dong et al., 2020). As the pandemic spreads, considerable cross-national differences of the epidemic curves become evident. The scientific community strives to provide satisfactory models to capture the factors of such underlying variability. These initial models explore the impact of government's response stringency (Fang et al., 2020), travel restrictions (Kucharski et al., 2020), or the mean duration of infectiousness (Anderson et al., 2020) on COVID-19 transmission rates. And while key aspects of the transmission dynamics of COVID-19 remain partially unclear from a scientific point of view, public opinion, beliefs and attitudes about and towards the pandemic are predominantly formed by mainstream media, press coverage, and governmental communication.
As per our current understanding, the most appropriate governmental response to the pandemic is ordering broader-scale social distancing and thus decreasing transmission rates, that provides the necessary time for health services to increase their capacity, and for laboratories to develop effective vaccines (Anderson et al., 2020). However, government response stringency is one thing, while population reaction to mitigation efforts is a horse of another color, that brings us to the key question of the present paper: what are the main social markers of COVID-19 spread rates? What kind of social norms form the outbreak's national picture? Is it the individual's willingness to provisionally disclaim his/her fundamental rights and freedoms in a time of essential restrictions? Or is it the discipline and compliance of the majority that matters? Does sociability, network density, or the norm of civic participation play a crucial role in reducing the required physical distancing? Or is it rather the accustomed freedom of assembly and association that makes it challenging to accommodate governmental regulations?
Former empirical findings related to the social and cultural background of certain pandemics (e.g. H1N1, zoonotic diseases, tuberculosis, or other airborne pathogens) indicated that low educational level, non-Caucasian ethnicity, overcrowding (Mayoral et al., 2013), living in a neighborhood with high material or total deprivation (Lowcock et al., 2012), individualistic values (Morand and Walther, 2018), poor socio-environmental conditions (Issarow et al., 2018) are usually associated with increased hospitalization, point-source epidemics, or the number of disease outbreaks due to virus infections. Yet, none of these studies assessed the potential connection between social capital, perceived importance of personal freedom and virus spread dynamics within a time-frame of extreme restrictions.
Therefore the aim of this study was to explore the association between social characteristics and the doubling rate of registered COVID-19 cases, while controlling for the potential confounding factors of climate zones, population size and density, government effectiveness and stringency, testing rates, healthcare access and quality, preventative interventions, GDP per capita, and travel and tourism contribution to GDP.

Methods
The guiding principles for data selection and inclusion were (1) relevance (i.e. being a worldwide study with nationally representative samples assessing social markers or a global indicator that is based on a consequent algorithm), (2) recency (i.e. data collection date as close in time to the COVID-19 outbreak as possible), and finally (3) repeatability and reproducability (i.e. studies that repeat sampling most favorably on an annual basis, increasing the reproducability of our proposed model). In some cases-e.g. the inclusion of 2010-2014 World Value Survey (WVY) (Inglehart et al., 2014) data, the 2015 Healthcare Access and Quality index (GBD 2015 Healthcare Access andQuality Collaborators, 2017) or travel and tourism direct contribution to GDP in 2018 (World Travel & Tourism Council, 2019)-we decided that the relevance of the dataset overrides its potential deficit in recency or annual repetition.
Outcome measure: Incidence data and doubling times. Considering COVID-19 incidence and doubling time, data was collected for a 71-day period running from 23 January 2020 to 27 March 2020 to encompass the initial phases of the outbreak. Corresponding incidence data was obtained from the database (https://github.com/CSSEGISandData/COVID-19 accessed on 28 March) of the Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) homepage, which is being updated daily based on WHO, CDC, ECDC, NHC, and DXY and local media reports (Dong et al., 2020). Doubling times were calculated from log-linear models of incidence. In China and South Korea, we included data only from the early increasing phase.
Covariates. Survey data reflecting social norms and cultural values were retrieved from the Legatum Institute's 2019 Prosperity Index (2020) measures (providing indicators of prosperity in 167 countries) and the WVS 6th wave dataset (Inglehart et al., 2014) (WVS studies the impact of changing values on social and political life of distinct societies: in wave 6 altogether 60 countries and more than 85,000 respondents participated). Since both the Prosperity Index (almost 300 country-level indicators) and the WVS dataset (almost 260 variables) contains a large number of items, we reduced the number of selected variables by focusing only on the following issues: (1) government effectiveness and the relationship between government and people (including trust and compliance or obedience); (2) the individual's relationship with others (including social utility, proper behavior, and social capital); and (3) the importance of personal freedom (including e.g. agency or freedom of assembly and association). Table 1 summarizes all covariates entered in our model, including data sources and variable descriptions. Due to high rates of missing values, Government Stringency Index (Hale et al., 2020), COVID-19 tests per million people (Our World in Data, 2020), and WVS data were only utilized for descriptive purposes.
Analytical methods. Model-based clustering was performed on the scaled data excluding missing variables (altogether for 87 countries). Variables with high number of missing values (namely tests performed per million, WVS items and Government Stringency Index) were omitted from the clustering such as the only categorical variable, the climate zone (Kottek et al., 2006;Rubel et al., 2017). We used the mclust package (Scrucca et al., 2016) to select the optimal model based on BIC for EM algorithm initialized by hierarchical clustering for parameterized Gaussian mixture models.
As a second step, we performed random forest regression analysis using the RandomForestSRC package with weighted meansquared error as splitting rule (Ishwaran and Kogalur, 2020) using the same observations. Continous variables were log transformed. Nodesize and the number of variables randomly selected as candidates for splitting a node were selected to achieve the lowest out of bag error. The forest consisted of 100 trees. We applied swor (sampling without replacement) resampling to grow the trees and all variables were included. Minimal depth and permutation variable importance measures (VIMPs) were calculated (Ehrlinger, 2016;Ishwaran, 2007). All calculations were performed with R 3.6.1 Agency defines the degree to which citizens are free from restriction and are free to move, indicating the experiences of the freedom to act independently and making free choices (its indicators consist of e.g. personal autonomy and individual rights, freedom of movement, satisfaction with freedom, etc.) Freedom of assembly and association (pillar of Personal Freedom) Freedom of assembly and association defines the degree to which citizens have the freedom to assemble with others in public spaces, or to express their opinions (its indicators consist of e.g. the right to associate and organize, the guarantee of assembly and association and the autonomy from the state) Social network (pillar of Social capital) Social network defines the strength and opportunities of the individual's relationships with the wider social network, including social support (its indicators consist of, e.g. respect, the opportunity to make friends or helping another household) Personal and family relationships (pillar of Social capital) Personal and family relationships defines the strength of the closest personal relationships and family ties, forming the individual's emotional, mental and financial support (its indicators consist of e.g. help from family and friends when in trouble or the positive energy provided by the family) Civic and social participation (pillar of Social capital) Civic and social participation defines the amount to which citizens participate within the society, split into the civic and social spheres (its indicators consist of e.g. donated money to charity, volunteering, voiced opinion to a public official, etc.) (R Core Team, 2019) with a custom script available as supplementary material together with detailed output.

Results
Clustering the observed countries. Clustering revealed four main clusters (Supplementary material presents each cluster with the list of corresponding countries): • Cluster 1-Reserved societies: Cluster 1 consisted typically of high population countries with lower population densities, low government effectiveness, lower GDP per capita, and HAQ index and yet higher incidence doubling time. The tests performed per million inhabitants was relatively low, the government stringency index was higher than in Cluster 3, but lower than in Clusters 2 and 4. There was a typical lower grade of agency, freedom of assembly and association, and family and personal relationships, as well as a typically dry, arid and semiarid climate. Countries of this cluster were for instance Iran, Pakistan, Egypt, Algeria, Kazakhstan, and Mexico. These societies seem to be characterized by social distancing and a relatively reduced need for personal autonomy, or the freedom of movement. Citizens of these societies showed less interest in doing something for the good of society, and found it less important to behave properly. On the other hand, they had higher confidence in their governments than the citizens of Clusters 2 and 3 countries, as well as they perceived obeying their rulers as an important feature of democracy. Population and population density National indicators of population size and population per unit area ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-00590-z • Cluster 2-Drifting societies: Countries from Cluster 2 (e.g. Czechia, Greece, Italy, Hungary, Slovenia, Slovakia, Peru, Estonia, Poland) had typically temperate climate, low population and population density, higher government effectiveness, moderate freedom of assembly and association, moderate agency as well as family and personal relationships, lower civic and social participation, lower GDP, and HAQ index. The tests performed per million inhabitants were lower than in Clusters 3 and 4 but higher than in Cluster 1. The government stringency index was high. In this group, the doubling time was lower. These societies are marked by firm leadership, a tendency to refuse social responsibilities, and a preference of close relationships over expansive networking. Citizens of these countries did not find it really important to do something for the good of society, or to behave properly, while they showed decreased confidence in their government and reported decreased importance of obeying their rulers (the latter two WVS variables showed lower scores than in case of Clusters 1 and 4 countries).
• Cluster 3-Assertive societies: Cluster 3 countries, like Australia, USA, UK, Germany, Sweden, Spain, Switzerland, and France with temperate climate, higher governmental effectiveness, stronger social networks, more significant personal and family relations, freedom of assembly and association, and higher GDP per capita, high HAQ index, and lowest doubling times. The tests performed per million inhabitants was high, the government stringency index was the lowest of all clusters. These societies are characterized by a strong need for personal autonomy and independence, social participation, and expansive networking. It is important for these individuals to do something for the good of society, and also to behave properly (these WVS variables showed the highest median scores in this cluster). On the other hand, citizens of Cluster 3 countries reported lower confidence in their governments, and they did not find obeying their rulers an important feature of democracy.
• Cluster 4-Compliant societies: Countries of Cluster 4 had typically dry climate, high population density, lower scores on social networks, personal and family relationships, and lower freedom of assembly and association. The HAQ score and GDP per capita was higher here, but not as high as in Cluster 3. The tests performed per million inhabitants was the highest among all clusters, such as the government stringency index. Countries from this cluster were for instance South Korea, China, United Arab Emirates, Israel, India, and Singapore. This group was associated with the highest doubling time (followed by Cluster 1). These societies might not propagate the importance of personal autonomy, the freedom to act or assemble. It is more important for these citizens to do something for the good of society and also to behave properly than in case of Clusters 1 and 2 countries. Cluster 4 countries showed the highest confidence in their government, and the second highest median scores of the perceived importance of obeying their rulers. Increased doubling time within these countries in spite of high population densities may also indicates regulatory compliance. The basic difference between Clusters 4 and 1 societies is that inhabitants of Cluster 4 countries show greater conformity not only towards their governments but towards other individuals as well ( Fig. 1 and Table 2) Predictors of doubling time. Random forest had an error rate of 0.25. Based on VIMP and minimal depth (MD) the most important predictors of doubling time were population density (MD: 2.07, VIMP: 0.0318), and freedom of assembly and association (MD: 2.61, VIMP: 0.0146), followed by agency score (MD: 2.68, VIMP: 0.034). Climate class showed a high VIMP (0.03), however the highest MD also (3.73). Most covariates showed a complex nonlinear relationship with the predicted doubling time (Fig. 2). An overall positive association was detected with population density, GDP per capita, and climate class 3 (temperate climate), an overall negative association with freedom of assembly and association, and with agency score.

Discussion
Our results indicate that influencing the general public attitudes and response to COVID-19 (or other pandemics) should be a public health priority. Among the most relevant covariates, accustomed freedom of assembly and association and agency were found to be associated with increased COVID-19 transmission speed (these variables were more important predictors than GDP per capita, government effectiveness, preventative interventions, or the quality of and access to health care). In Clusters 2 and 3 countries (labeled as drifting and assertive societies), freedom to act, to assemble, to move is a more commonly experienced or propagated human right than in Clusters 1 and 4 countries (labeled as reserved and compliant societies), therefore we might assume that people from these societies find it more demanding to accommodate strict governmental regulations, as well as the governments show lower stringency towards the citizens. Maintaining the need for human rights is an essential challenge during the pandemic, even if Clusters 2 and 3 societies show an elevated risk for accelerated transmission. Experiencing freedom of choice and freedom of act are basic democratic rights, yet they are as fragile as important in times of crisis, as individuals are often prone to trade freedom for security and also to modify their preferences in order to adapt to undesirable circumstances (Faden and Shebaya, 2010), especially in the context of serious public health threats. It is thus each government's responsibility not to take advantage of the current situation or exploit the population's fear and anxiety as a foundation of increased governmental power. The current pandemic raises several questions concerning the balance between effectively combating the spread of coronavirus and protecting fundamental human rights. Amnesty International has just recently published (2020) recommendations for European states, urging them to select responses to COVID-19 that are in line with human right obligations. These recommendations-amongst other thingsinclude the right to privacy (i.e. increased digital surveillance is only acceptable in exceptional circumstances and in case the measures are legitimate, necessary, proportionate, and non-discriminatory); or pointing out that government responses limiting human rights (e.g. restricting freedom of movement and assembly, right to work, right to private and family life) must be led by legitimate public health goals that are based on scientific evidence.
Another important social factor is related to the public's interpretation of the pandemic risk (e.g. "I am" at risk vs. "We are" at risk). As it was already mentioned before, individualistic values are often associated with an increase in the outbreaks of infectious diseases (Morand and Walther, 2018), assuming that individualistic cultures as opposed to collectivistic ones are more vulnerable to accelerated virus transmission, as citizens of collectivistic societies may better protect in-group members against pathogen transmission. This is in line with the pathogen prevalence theory of Fincher et al. (2008), namely that collectivistic cultures are usually situated in the hotter regions of the world, where pathogens causing more severe medical conditions are more likely to occur, therefore collectivistic attitudes serve as a means of survival (such as the protection of in-group and the exclusion of out-group members). In terms of the current research, former notions (i.e. connection between hotter climates and collectivistic cultures) can mainly be interpreted within Cluster 4 (and not Cluster 1) societies, with higher population densities. Several Cluster 4 countries (e.g. China, India, Japan, Singapore, or South Korea) are traditionally considered to be rather collectivistic than individualistic. Nonetheless, one should also keep in mind that the most cited operationalization effort regarding the valid measurement of collectivism-individualism (by Hofstede) dates back to the 1960s (Hofstede, 1980) and is based on non-probability samples. We therefore preferred to use social data derived from the Legatum Institute's 2019 Prosperity Index or the WVS study instead of, e.g. Hofstede's dimensions. However, since the 6th wave of WVS collected data between 2010 and 2014, it would be worth reanalyzing our data when the WVS wave 7 dataset becomes available (July 2020).
These initial findings may serve as a departing point for further research looking deeper into the social determinants of a pandemic course and severity, and applying interdisciplinary models (e.g. a joint endeavor across health sciences, public health, and social sciences) in order to better understand the social construction of increased transmission rates for highly infectious diseases. The fact that personal right to assemble or associate with others and agency were stronger predictors of reduced doubling time than HAQ index, GDP per capita, or government effectiveness once again indicates the importance of the population attitude and reaction to pandemic crisis. From a public health perspective it is of heightened relevance to explore both the ethical and psychological side of the freedom vs. security dilemma. Within the ethical framework proposed by Faden and Shebaya (2010), public health policies and regulations potentially restricting basic human rights (such as in case of severe infectious disease outbreaks) may be justified on the grounds of highlighting the overall social benefit of mitigation, promoting collective actions, ensuring fairness in the distribution of regulation burdens, or interfering with the liberty of an individual but only for the purpose of preventing harm to others (the harm principle). In times of any global crisis (such as the current COVID-19 pandemic) increased paternalism (i.e. authorities take actions to protect the health and welfare of people against their will) might also be observed. However, paternalism can be "soft" (e.g. citizens suffer from immaturity, ignorance, or the holding of false beliefs) and "libertarian" (e.g. influencing citizens' choice through persuasion and not by using of force or compulsion) as well, and there can be considerable cross-cultural differences in the population's tolerance for a certain level of paternalism, i.e. in countries where shared-decision making or egalitarian approaches are adopted, strict paternalism may be more easily rejected, while in other regions of the world with different cultural standards, people might be more accepting towards a more paternalistic leadership (Abedini et al., 2015). Currently one of the most important question that still remains unanswered is whether or not libertarian paternalism (e.g. manifested in government communication and regulatory strategies) is efficient enough in terms of ensuring social distancing. In any ways, reducing negative psychological impact of quarantine might be a good public health starting point in helping the citizens to bear the frustration of limited freedom and thus keeping them motivated to maintain the expected distancing. Based on the insights of a recent rapid review (Brooks et al., 2020), main negative effects of quarantine includes post-traumatic stress symptoms, confusion and anger, depending on e.g. the duration of quarantine, infection fears, boredom, adequate or inadequate information, financial loss or stigma. Thus, some of the best practices to minimize the pathogenic outcomes of drastic regulation efforts consists of providing clear rationale for quarantine, ensuring sufficient supplies, or reminding the public about the potential social benefits of such extraordinary experiences. Keeping the public informed is particularly important as one of the most common psychological strategy to reduce anxiety is finding meaning or purpose in the pandemic. Several pandemic narratives and interpretations can be observed, many of which are rather conspiracy beliefs than evidence supported opinions, presuming covert political or economical interests behind the origin or the scale of the pandemic, and mostly supposing that SARS-CoV-2 is a purposefully manipulated laboratory construct. And even though these theories are often rebutted by scientific evidence (Andersen et al., 2020), conspiracy beliefs and misinformations are still on the rise (Mian and Khan, 2020).

Concluding remarks
With infectious diseases (such as tick-borne, mosquito-borne, vector-borne and foodborne illnesses) inducing an emerging and recurring public health threat, the importance of reliable forecasting models is emphasized by both the authors of this paper and others (Lutz et al., 2019). While this study is certainly not without any limitations (i.e. the potential obsolescence of WVS and HAQ data; the unconfirmed replicability of cluster assignments; or the limited number of observations), on the basis of our findings we advocate the relevance of social factors in such models, and thus encourage the scientific community to explore the sociology of current and future pandemics, or to examine the validity of our cluster typology in terms of the initial epidemic curves of other infectious diseases.

Data availability
Analyzed datasets are all freely available in the Dataverse repository: https://doi.org/10.7910/DVN/PRKEVU, the custom script and the detailed output is available as supplementary material.
Researchers are encouraged to reanalyze the datasets to examine the reproducibility and the validity of our proposed cluster model in case of the initial epidemics of further infectious diseases as well.