Perceptions of the appropriate response to norm violation in 57 societies

Norm enforcement may be important for resolving conflicts and promoting cooperation. However, little is known about how preferred responses to norm violations vary across cultures and across domains. In a preregistered study of 57 countries (using convenience samples of 22,863 students and non-students), we measured perceptions of the appropriateness of various responses to a violation of a cooperative norm and to atypical social behaviors. Our findings highlight both cultural universals and cultural variation. We find a universal negative relation between appropriateness ratings of norm violations and appropriateness ratings of responses in the form of confrontation, social ostracism and gossip. Moreover, we find the country variation in the appropriateness of sanctions to be consistent across different norm violations but not across different sanctions. Specifically, in those countries where use of physical confrontation and social ostracism is rated as less appropriate, gossip is rated as more appropriate.

N orms, in the sense of collective ideas about approved and disapproved behavior, exert a powerful influence on how people behave 1 . However, not everyone complies with these norms, which may create dilemmas for those who witness norm-violating behaviors and must decide whether to respond with some kind of sanction. On the one hand, previous work has suggested that norms encouraging informal sanctions are critical to sustaining cooperation and social order in human groups [2][3][4] . On the other hand, unfettered or inappropriate use of sanctions may threaten social harmony by creating costly conflicts 5,6 . Thus, cooperation and social harmony depend on norms about the use of informal sanctions. Such norms about norm enforcement have been termed metanorms 7 . Despite their importance, surprisingly little is known about how metanorms operate in everyday life, let alone across societies.
Existing research often examines and conceptualizes sanctions in generic terms as a form of punishment that reduces outcomes for another person 8,9 . While parsimonious, this characterization is unlikely to provide a realistic account of how people deal with norm violators in everyday life. To capture this realism, scholars 10,11 have recently proposed three distinct informal sanctions: social ostracism (e.g., individuals or groups actively avoiding someone), gossip (e.g., spreading information about someone's inappropriate behavior), and direct confrontation (e.g., verbal or physical). Although these responses may not always be intended to modify the norm violator's behavior, they can all be viewed as expressions of disapproval that serve to strengthen a given norm. A key reason that potential norm enforcers may prefer one response over another is that responses may differ in the extent to which the sanctioned party becomes aware of being sanctioned. For instance, whereas direct confrontation should be especially effective at making the norm violator aware of why they are being sanctioned and thus change their behavior, gossip should be less likely to evoke direct conflict but can still promote norm compliance by making the norm more salient in the group. Similarly, physical confrontation may be harmful in a way that verbal confrontation is not. And social ostracism may directly harm targets' opportunities whereas gossip may harm targets more indirectly via reputational damage. Prior cross-cultural work has rarely distinguished between forms of sanctions, instead focusing on costly actions that reduce outcomes for another person in economic games 12,13 , physical confrontation 14 , or unspecified "punishment" 15 .
To compare the perceived appropriateness of different forms of sanctions across societies, we studied participants in 57 countries, including 7 African countries, 10 American countries, 18 Asian countries, 21 European countries, and Australia. The study included 10 basic scenarios, mostly drawn from prior studies of norm violations 14,16 . These stimuli covered various domains of norm violations and included both animations and verbal scenarios. One scenario described a violation of a cooperative norm regarding a common resource 14 . Four scenarios described behaviors that were normatively out of place, such as listening to music in headphones at a funeral 16 . Five "metaviolation" scenarios described a potentially overly harsh response to another's behavior, such as someone responding to a verbal insult by physical confrontation. For each of the 10 scenarios, participants rated the appropriateness of the described behavior as well as the appropriateness of four different responses to it: verbal confrontation (making an angry remark to the norm violator), gossip (talking to someone else about the norm violator), social ostracism (making a point of avoiding the norm violator in the future), and non-action (doing nothing), for a total of 10 × 5 = 50 ratings.
The study, including the following five key hypotheses, was preregistered with the Open Science Framework (osf.io/qg6xy).
Hypothesis 1: The more appropriate a triggering behavior is perceived to be, the more appropriate it is to respond by doing nothing and the less appropriate it is to respond by using confrontation, social ostracism, or gossip, and this will be consistent across countries. The hypothesized negative relation between the appropriateness ratings of norm violations and the appropriateness rating of a response has previously been reported specifically for verbal confrontation in the United States 17 . But we do not know whether it holds for other forms of sanctions and across cultures. This relationship is important for both conceptual and methodological reasons. Conceptually, only a negative relation would signify that that the sanction is indeed an expression of disapproval. Methodologically, when comparing the perceived appropriateness of a given response across societies, it is important to control for the appropriateness rating of the norm violation as this may differ between societies. The four following hypotheses concern the country variation in the perceived appropriateness of informal sanctions: its consistency across different norm violation domains, its specificity across different forms of sanctions, its relation to variation in the use of informal sanctions, and its relation to variation in other cultural and societal factors.
Hypothesis 2: The country-level variation in the perceived appropriateness of informal sanctions is robust across different domains of norm violations. As metanorms are assumed to serve the function of sustaining cooperation 4 , empirical work has focused on norm violations with respect to contributions to, or depletion of, common resources [18][19][20] . We will refer to this category of norm violations as the "cooperation domain". Importantly, many social norms do not belong to the cooperation domain and may be more mundane, often just stipulating that certain acts are not appropriate in certain contexts 16 . For instance, it tends to be viewed as inappropriate to sleep in a restaurant or to listen to music in headphones at a funeral. For theorizing about the psychology of informal sanctions it is crucial to know whether the cooperative domain of resource dilemmas has a special status or whether the perceived appropriateness of a given response is independent of the domain of the norm violation (holding constant how inappropriate the norm violation is perceived to be). The present research illuminates this issue. Previous work on verbal punishment of uncivil behavior does not indicate any special status of the cooperation domain 17,21 , supporting the parsimonious hypothesis that the psychology of norms has a high level of generality that cuts across various domains. Hence, we expected that the appropriateness rating of the norm violation would be leading in influencing evaluations of the appropriateness of the responses, in a manner relatively independent of the domain of the norm violation.
Hypothesis 3: With respect to the specificity of different sanctions we have three competing sub-hypotheses. Comparing across different forms of informal sanctions, the country variation in ratings of their appropriateness will exhibit either (A) consistency, (B) complementarity, or (C) independence. As mentioned earlier, different forms of real-life sanctions seem to be quite distinct, yet much prior work has used a unitary conceptualization of sanctioning as payoff reduction. A unitary conceptualization may be warranted if metanorms vary in the same way for different forms of sanctions, such that some societies view sanctions, in general, as more appropriate than other societies. Another possibility is that all societies employ sanctions but have different preferences for the form they should take, such that a lower appropriateness rating of one sanction is matched by a higher level for another sanction; for instance, it could be that societies prefer either direct confrontation or nonconfrontational sanctions such as ostracism and gossip. A final possibility is that different forms of sanctions serve different purposes and thus have little to do with each other, in which case the appropriateness ratings of different sanctions would be unrelated.
Hypothesis 4: In countries where a given sanction is more viewed as appropriate it will also tend to be used more often. This hypothesis constitutes a validation of metanorms. Just as norms influence behavior, metanorms are expected to influence sanctioning behavior.
Hypothesis 5: The perceived appropriateness of direct punishment will be higher in countries estimated to have (a) lower on indulgence, (b) higher on power distance, (c) lower on individualism and individual autonomy values, (d) higher on tightness, (e) higher on experienced threat, (f) lower on emancipative moral judgments, (g) higher on pro-violence attitudes, (h) higher on pathogen prevalence, (j) lower on gender equality, and (j) lower on median income. All predictions were preregistered except the last three (which theoretically connect to the other predictions, see below).
To enable examination of Hypothesis 5, our survey includes several culture measures that we aggregate to country level: individual autonomy (valuation of independence and determination over religious faith, and obedience), emancipative moral judgments (how justified it is with homosexuality, divorce, abortion, and suicide), pro-violence attitudes, tightness (pervasiveness of social norms and low tolerance for noncompliance), and perceived threats to society (from disease, conflicts, etc.). We also use data from other sources as follows. We use country measures of indulgence (available for 48 countries in our study), power distance (51 countries), and individualism (51 countries) provided by Hofstede et al. 22 . We use country measures of pathogen prevalence from prior work on the historical prevalence of infectious diseases in different geopolitical regions 23 . We measure national levels of gender equality by the Global Gender Gap Index, which is calculated by the World Economic Forum based on gender gaps in economic participation and opportunity, educational attainment, health and survival, and political empowerment 24 . From Gallup we obtain country measures of median income (50 countries) 25 .
Note that the predictions in Hypothesis 5 focus on how the perceived appropriateness of direct punishment (physical and verbal confrontation) will vary across societies; whether the same patterns or the opposite patterns will hold for indirect sanctions like social ostracism and gossip depends on which of the subhypotheses of Hypothesis 3 is correct. Our predictions on direct punishment draw on theories of cultural dimensions, societal tightness-looseness, and behavioral responses to the experience of ecological threat. With respect to cultural dimensions, a crosscultural study found that responding to non-cooperation by physical confrontation was viewed as more appropriate in countries that were characterized by low levels of indulgence (i.e., restrictive of enjoying life and having fun, which a norm violator may be viewed as doing), high levels of power distance (i.e., accepting asymmetry of power, which a punisher may be viewed as wielding), and low levels of individualism (i.e., emphasizing group embeddedness over individual autonomy, which a norm violator may be viewed as expressing) 14 . In line with the role of individualism, another study found verbal confrontation of uncivil behavior to be more normative in less individualistic societies 21 . With respect to tightness-looseness, there is some cross-cultural evidence showing that formal institutions tend to be more punitive in tighter countries 3,16 , and our prediction is that this extends to informal sanctioning. Tight societies have generally experienced more collective threat, and direct punishment of deviants may be evolutionarily adaptive in these contexts 26 . By extension, the perceived appropriateness of direct punishment is also expected to be related to the experience of threat. The theory of behavioral immune system similarly traces the origins of cultural differences to ecological threat, especially pathogen prevalence, which is assumed to increase the need for social coordination and thereby lead to cultures with less individualism, greater power distance, and less tolerance of nonconformity 23,27 , all of which suggest that pathogen prevalence will also increase the perceived appropriateness of direct punishment. Finally, modernization theory ties the development of cultural values to economic development. Specifically, increased prosperity is assumed to facilitate a shift from traditional values and community discipline to post-material, emancipative values that include a greater emphasis on individual autonomy, gender equality, and emancipative moral judgments 28 . Through a variety of socioeconomic mechanisms, modernization is thought to increase competition and complexity, and reduce interdependence, thereby increasing prioritization of individual freedom, choice, and agency over conformity to the needs or traditions of a society. It is therefore expected to be associated with greater tolerance for a wide range of norm violations and, consequently, a decrease in the perceived appropriateness of punishing them.
In this study of 57 countries we find support for the five hypotheses outlined above. Thus, we find a universal negative relation between appropriateness ratings of norm violations and appropriateness ratings of responses in the form of confrontation, social ostracism, and gossip. The country variation in the appropriateness ratings of sanctions is found to be consistent across different norm violations but not across different sanctions. While the use of confrontation and social ostracism is viewed as less appropriate in more prosperous countries with more emancipative values, the opposite holds for gossip. Our findings thus highlight both cultural universals and cultural variation with respect to beliefs about how norms should be enforced. Perhaps most intriguingly, our findings suggest that responses to norm violators may shift with economic development in a specific way, such that gossip to some extent is used in place of more punitive sanctions, potentially affecting societies' ability to achieve norm compliance.

Results
All appropriateness ratings were made on a six-point scale from extremely inappropriate (coded 0) to extremely appropriate (coded 5), which were standardized for each respondent to control for response sets. Throughout, numbers in brackets refer to 95% bias-corrected and accelerated confidence intervals based on 1000 bootstrap samples generated by SPSS v. 26.0. The sample size for analyses is n = 57 countries unless stated otherwise. Hypothesis 1. As preregistered, we tested Hypothesis 1 in each country by calculating correlations, across the ten scenarios, between country-mean ratings of norm violations and a given response. The boxplots in Fig. 1 illustrate the results, confirming that informal sanctions were essentially universally viewed as less appropriate to use the more appropriate the norm violation was Country measures of metanorms. Following our preregistration, we calculated country measures of metanorms for each of four responses (verbal confrontation, social ostracism, gossip, nonaction) by using country-mean appropriateness ratings for the five scenarios in the non-cooperation and out-of-place behavior domains. These were adjusted for variation in ratings of the appropriateness of the underlying norm violations (see "Methods"). Metanorm measures for all countries are reported in Supplementary Table 2 and illustrated on color-coded maps in Supplementary Fig. 1.
In the preregistration we assumed that metanorms for verbal and physical confrontation would be the same, but recent work has shown that they may be viewed quite differently 10 . We therefore separately calculated the country measures of metanorms for physical confrontation by averaging the country-mean appropriateness ratings of two meta-violation scenarios in which physical confrontation was used in two different contexts: against an agent depleting a common resource and against someone insulting a man's mother.
Averaged over all countries, the responses rated as most appropriate were non-action, M = 2. Importantly, there was no global consensus on the most appropriate response. Verbal confrontation was rated highest in 26 countries. However, non-action was rated highest in the remaining 31 countries, and in 17 of these countries the highestrated sanction was gossip. There was even one country (Thailand) where social ostracism was the highest-rated sanction.
As a robustness check of observed country differences, we found that differences in metanorms between cities in the same country, as well as between students and non-students in the same country, tended to be much smaller than the country differences (Supplementary Table 3). Metanorm measures were virtually unchanged in analyses that excluded participants who failed attention or comprehension checks (Supplementary Table 4). In an additional, unregistered, analysis we found generally high internal consistency of country-level appropriateness ratings of a given response across different scenarios (Supplementary Table 9). Thus, Hypothesis 2 was supported.
Hypothesis 3. Following the preregistration, we analyzed the sanction-specificity of metanorms by calculating pairwise partial correlations of the metanorm measures for different sanctions, controlling for the metanorm measure for non-action (Table 1). When interpreting these correlations, note that they will, to some extent, be artificially lowered due to ratings being standardized. Nonetheless, they present a very clear but complex pattern, simultaneously including all possibilities discussed in Hypothesis 3: consistency, independence, and complementarity. Metanorms for physical confrontation and social ostracism showed a high level of consistency (i.e., were positively correlated), but were largely independent of the metanorm for verbal confrontation. Strikingly, the metanorm for gossip showed strong complementarity (i.e., negative correlations) to the metanorms for all other sanctions.
Hypothesis 4. To measure the frequencies by which various informal sanctions are used in different countries, the survey included three items where participants estimated how often they use various responses to someone who does something inappropriate. As preregistered, we tested Hypothesis 3 by calculating the correlations between the country mean for the frequency of    Table 2). For physical confrontation, all correlations showed the predicted direction. Particularly strong results (r > 0.50) were obtained for power distance, individualism, individual autonomy, emancipative moral judgments, tightness, national levels of gender equality, and median income. Results for verbal confrontation were weaker and two of the correlations (for tightness and pro-violence attitudes) went weakly in the wrong direction. Thus, Hypothesis 5 received support but much more strongly for physical confrontation than for verbal confrontation, underscoring the need for making a distinction between these sanctions. Results for social ostracism showed the same pattern as for physical confrontation. However, results for gossip followed the exact opposite pattern. For example, the appropriateness rating of gossip was higher in countries that were higher on individualism, autonomy values, emancipative moral judgments, gender equality, and median income. This opposite pattern for gossip is consistent with our previous analysis of sanction-specificity of metanorms.
(Correlations tend to keep the same signs when metanorms are estimated separately for non-cooperation and out-of-place behaviors, see Supplementary Table 6.) When drawing conclusions about the origins of variation in metanorms, it is important to note that cultural, ecological, and economic variables are often strongly intercorrelated (Supplementary Table 7). Moreover, the strength of correlations between metanorms and emancipative moral judgments may in part be due to both constructs being based on appropriateness ratings of actions. Among the other variables, median income and the national level of gender equality tended to show the strongest relation to metanorms overall. We use scatterplots to illustrate that in countries where median income was higher the perceived appropriateness of physical confrontation tended to be lower (Fig. 2), while the perceived appropriateness of gossip tended to be higher (Fig. 3).

Discussion
Although norms about punishing norm violators may be critical for maintaining cooperation in human groups, there has been Correlations are based on n = 57 countries, except for indulgence (n = 48), power distance (n = 51), individualism (n = 51), and median income (n = 50). 95% confidence intervals are presented in brackets. Fig. 2 The negative association of median income with the appropriateness rating of physical confrontation across 50 countries. Including regression line (R 2 = 0.45). Every dot represents a country. The x-axis represents median per-capita income according to Gallup 24 . The yaxis represents the metanorm for physical confrontation, that is, the mean appropriateness rating for scenarios where someone responds to a norm violation by physical confrontation. little empirical research on which sanctions people in fact view as appropriate and how this may vary across different norm violations and across countries. The first key finding of this crosscultural study of the perceived appropriateness of using informal sanctions was culture-universal: the participants consider it more appropriate to use gossip, social ostracism, and confrontation the more inappropriate the triggering behavior is perceived to be. This finding supports our assumption that these distinct responses are all universally used as expressions of disapproval and can therefore be conceived of as informal sanctions. The next key finding was that metanorms for the different sanctions were consistent within countries and largely independent of the domain of the norm violation. Specifically, the same rules for what is an appropriate response to non-cooperation seem to apply to behavior that others simply find uncivil or out of place. This finding is consistent with a parsimonious psychology of informal sanctions that does not include any specific adaptations for the cooperative domain. It poses a challenge for theories of the evolution of cooperation, as it may not be sufficient to focus only on the cooperative domain when modeling the evolutionary dynamics of sanctions 29 .
Our study also contributed to the longstanding debate on whether metanorms require punishment of norm violators. Theoretical work on altruistic punishment has often assumed that not punishing a norm violation is selfish and hence should be deemed inappropriate 30 , and studies using economic experiments have found that those who pay a cost to punish others' selfish behavior are subsequently trusted more than non-punishers 31,32 . However, studies have also found that non-punishers are not viewed as more selfish 29,33 and do not elicit more disapproval 14,[18][19][20] . Our finding that non-action was often viewed as the most appropriate response is consistent with this latter research, supporting the notion that metanorms often do not require bystanders to punish norm violators. Note that this conclusion applies only to relatively minor norm violations, as the perceived appropriateness of non-action was found to decrease for more serious infractions (Fig. 1).
When designing the study to include several forms of informal sanctions, it was an open question to us whether different forms would exhibit similar cross-cultural variation in appropriateness ratings. We speculated that there could instead be complementarity between preferences for confrontation and preferences for non-confrontational sanctions, such as social ostracism and gossip. Indeed, we did find a separation between societies condoning physical confrontation and societies condoning gossip-but, surprisingly, metanorms for gossip were negatively correlated with metanorms for social ostracism, which instead were positively correlated with metanorms for physical confrontation. It was also surprising that metanorms for physical and verbal confrontation were only weakly correlated. These results indicate that metanorms are sanction-specific. This interpretation was supported by the additional finding that metanorm measures for distinct sanctions correlated well with the reported levels of use of the same sanctions. Nonetheless, the observed pattern of consistencies and complementarities across different forms of informal sanctions remain intriguing puzzles that require further research. We offer some thoughts below.
By relating metanorms to other country variables, our study speaks to theories of how variation in metanorms may emerge. One theory is that variation in metanorms reflects variation in cultural values and norms; for instance, more individualistic values and loose norms may give individuals more leeway in violating norms without getting punished, while greater power distance may raise the acceptance of individuals asserting authority by punishing norm violators. Consistent with this theory, appropriateness ratings of physical confrontation and social ostracism were negatively correlated with individualism and looseness, and positively correlated with power distance. However, culture is not static. In a process thought to be driven by increasing economic prosperity, cultural values have been shifting quite rapidly in modern times, including increasing autonomy for individuals, more emancipative moral judgments, and less inequality between men and women 34 . Our study suggests that metanorms are similarly shifting. Although the shift itself cannot be observed in this cross-sectional study, we observed high positive correlations of metanorms with emancipative moral judgments, the national level of gender equality, and median income. An alternative theory is that both cultural values and metanorms respond to the local need for social coordination that may be caused by conditions of ecological threat, especially pathogen prevalence. Our data provided moderate support for the role of pathogen prevalence, but no support for perceived threat being related to metanorms.
Metanorms for gossip showed a unique pattern. In countries with higher median income, gossip tended to be more, not less, appropriate. Thus, if metanorms are indeed shifting as living standards rise in the population at large, gossip appears to become viewed as more appropriate as physical confrontation becomes viewed as less appropriate. The specific rise of the perceived appropriateness of gossip in countries with high living standards is one of the most intriguing findings of our study. What is it about gossip that makes its perceived appropriateness change in ways distinct from social ostracism, which is another non-confrontational sanction? One key difference is whether the response is directed to the norm violator or a third party. Specifically, confrontation and active avoidance concern responses that are related to how you interact with the norm violator; in contrast, gossiping concerns how you interact with another person. For this reason, people may think of both confrontation and social ostracism as "punishment" while viewing gossip in a Fig. 3 The positive association of median income with the appropriateness rating of gossip across 50 countries. Including regression line (R 2 = 0.33). Every dot represents a country. The x-axis represents median per-capita income according to Gallup 24 . The y-axis represents the metanorm for gossip, that is, the mean appropriateness rating for scenarios where someone responds to a norm violation by gossip (adjusted for country differences in the appropriateness ratings of the norm violations). different way, even though they are all expressions of disapproval and even though gossip may be as effective in sustaining norms 35 .
But why is gossip considered more appropriate, and "punishment" less appropriate, in societies that are more affluent and have more emancipative values? One possibility is that a decrease in the perceived appropriateness of "punishment" in these societies is compensated for by a complementary increase in the perceived appropriateness and use of gossip. Gossip allows one to examine whether other people share your evaluations and to prepare for alternative forms of communication, such as public messages that underscore a specific norm without singling out any individual 1 . Second, gossip may be viewed as more appropriate in more individualistic societies because of differences in social network structures; it may be viewed as less appropriate to talk about a norm violator to someone who is socially close to that person, which would typically be the case in collectivistic societies where social networks are more overlapping 36 . A third interpretation is that norms about gossip are less shaped by their role for norm enforcement than by their role in free information exchange, which arguably becomes more important as societal complexity increases 37 .
Outside of lab experiments, we know of no data on the relative effectiveness of different forms of sanctions for achieving norm compliance. If some sanctions are more effective than others, the country variation we have observed may cause varying levels of compliance. This may be a particularly fruitful avenue of research in connection with the social norms emerging in response to the coronavirus crisis: Are violations of norms about social distancing, say, more common (or less common) in countries favoring gossip than in countries favoring confrontation?
Before closing, we should note some strengths and limitations of the present research. Although we used only 10 different norm violation scenarios, these covered a wide range of specific behaviors and contexts (e.g., singing in a library), supporting the exciting conclusion that metanorms apply across behaviors and contexts of the underlying norm violations, even though metanorms vary across countries. The scenarios were hypothetical, but results were validated against the actual use of informal sanctions reported by respondents. Finally, our sampling strategy had both strengths and limitations. By collecting data from both students and non-students, and across different cities, we established that these subsamples tended to have similar metanorms in the same country. However, it is possible that metanorms exhibit withincountry variation along the urban-rural and socioeconomic dimensions, which we were unable to capture when focusing on urban locations with universities.
A major contribution of the present research is the finding that metanorms are not universal but are subject to systematic crosssocietal variation. Note that a lack of consensus about the right way to deal with norm violators may contribute to conflict. Disagreement about social norms is a fact of social life. As the world becomes "smaller" and more interconnected, societies increasingly face the need to consider and negotiate what is the most appropriate response when one's own norm is violated. It also may make it more likely that one's own norm-violating behavior may elicit very different forms of sanctions. Both experiences underline not only the scientific importance of metanorms, but also how they may receive growing attention in a world that faces opportunities for cultural diversity.

Methods
The study was preregistered at OSF (osf.io/qg6xy) at the start of data collection. The full survey and the data used in the present paper are openly available at OSF (osf.io/pm5kc/).
For comparability of samples, we set out to collect data from approximately 200 college students in a major city in each country, which was achieved in almost all countries. To assess the robustness of the country-level measures obtained from these samples, we complemented the main sampling strategy in two ways: (a) we collected additional data from non-student samples (or, in two cases, part-time students) in 31 countries; (b) we collected data from two or more student samples located in different cities of each of 10 countries. In total, we have data from 22,863 participants (students: n = 18,091; non-students: n = 4772), after excluding a few participants (1.5%) who reported an age under 18. Descriptions of the data collection sites and their sample characteristics are reported in Supplementary  Table 1. Participants were recruited using a variety of methods, such as invitations via email, on social media, in class, face to face on campus, using public notices and flyers, and using survey organizations.
The survey was translated into 30 different languages, following the usual practice of independent translation and back-translation. The study was conducted anonymously online using Qualtrics, with a few exceptions. Part of the Estonian non-student sample and the Ghanaian student and non-student samples were collected using pen and paper at the university, with animations shown on a big screen.
All participants gave their informed consent and we complied with all relevant ethical regulations. Approval of the study protocol was obtained from ethics committees and institutional review boards where required, including Queen's University Scenarios. Scenarios were selected to cover potentially norm-violating behavior in three domains: cooperation, out-of-place everyday behavior, and meta-violations (i.e., potentially norm-violating use of an informal sanction).
The cooperative domain was covered by an animation of an agent depleting a common resource, referred to as scenario A. This scenario was drawn from prior research on metanorms 14 .
Out-of-place everyday behavior was covered by four scenarios describing someone (B) listening to music on headphones at a funeral, (C) sleeping in a restaurant, (D) singing in a library, or (E) reading a newspaper at the movies. These combinations of behaviors and contexts were found to be widely viewed as inappropriate in a prior cross-cultural study of norms 16 .
Meta-violations included two instances of physical confrontation: (F) an animation of an agent physically confronting someone who depleted a common resource in scenario A, and (G) a verbal scenario with a man being physically aggressive against someone who insulted his mother. We use scenarios F and G to calculate metanorm measures for physical confrontation.
The remaining three meta-violation scenarios described someone who reacted to a person who was rude in a public place in one of three ways: (H) by reprimanding this person, (I) by speaking negatively about this person, or (J) by staying away from this person.
Missing values. Missing values were handled by imputation, using the EM method in SPSS.
Standardization. To control for response sets with respect to the appropriateness response scale, the preregistered plan specified standardization of the participants' mean response across all items, referring to the 50 items of the metanorm instrument, which all used the same response scale from extremely inappropriate to extremely appropriate. Notably, in addition to the metanorm instrument, the survey included various other items that used different response scales to measure how often something happens or how strongly the respondent agrees with a statement, etc. All 50 appropriateness ratings of a participant were adjusted by a constant equal to the grand mean of all appropriateness ratings in the entire sample minus the mean of all appropriateness ratings by that participant. Thus, ratings were raised for participants who had tended to use lower ratings than the average participant, while ratings were lowered for participants who had tended to use higher ratings than the average participant. The standardized ratings have the property that the mean rating across the 50 appropriateness items is identical for every participant (and identical to the grand mean of the original ratings across the entire sample).
When interpreting results based on standardized ratings, we account for the fact that standardization leads to some artificial negative effects on correlations between different appropriateness items (i.e., items that are in fact uncorrelated will, after standardization, tend to become slightly negatively correlated). Below we also consider an additional standardizing method that was not preregistered: standardizing metanorm measures for sanctions by subtracting the metanorm measure for non-action.
Calculation of metanorm measures. As specified in the preregistered analysis plan, metanorm measures were obtained by adjusting county mean ratings for a given response (verbal confrontation, social ostracism, gossip, or non-action) by controlling for individual appropriateness ratings of the underlying norm violations. The technical specification is as follows.
Let N s,c,i denote the appropriateness rating of the norm violation in scenario s given by individual i in country c (centered on the global mean). Let N s,c denote the average value of N s,c,i over all respondents from country c. Let R s,c,i denote the appropriateness of the given response in scenario s as rated by individual i in country c. Then the metanorm measure in country c, denoted by R c , is calculated by estimating the multi-level model where the terms b 1 N s,c + b 2 N s,c,i adjust for the appropriateness rating of the norm violation at country-and individual level, e c,i is a random effect at the individual level, and e s,c,i is the residual error term. Scenarios A-E were used in the main estimation. However, other sets of scenarios may be used instead. Robustness checks reported in the main text included basing metanorm measures on the set of all ten scenarios (A-J) as well as only on scenario A. Note that when a single scenario is used, the country-level term b 1 N s,c becomes redundant and the multi-level model reduces to a simple linear regression.
Culture measures. The survey included the following culture measures.
Hofstede scales. Four-item scales for individualism, power distance, and indulgence (12 items in total) from the Hofstede VSM 2013 questionnaire. Country-mean responses showed all three scales had poor internal consistency, all α < 0.30, so they are not used.
Use of informal sanctions. Single items on participants' own use of confrontations, gossip, and avoidance (e.g., "How often does someone confront you for doing something inappropriate?" and "How often do you confront someone who does something inappropriate?"), and on participants' perceptions of others' use of these sanctions against themselves (e.g., "How often does someone confront you for doing something inappropriate?"), on a five-point scale from "never" (1) to "always" (5). We use the country-mean responses.
Individual autonomy. We use a measure of cultural values on individual autonomy adopted from the World Values Survey (WVS). Participants are asked to select up to five important qualities for children to learn at home, from a list of 10 qualities. Among the potential alternatives are independence, determination/perseverance, religious faith, and obedience. As in the WVS, the autonomy measure (ranging from −2 to +2) was calculated by the formula Autonomy = Independence + Determination -Faith -Obedience, where qualities are coded 1 if selected, 0 otherwise. At the country level this measure had adequate internal consistency (α = 0.75).
Emancipative moral judgments. We use a four-item scale adopted from the WVS, asking how justified it is with homosexuality, divorce, abortion, and suicide, on a scale from never justified (0) to always justified (10). Country-level internal consistency was very good (α = 0.92).
Pro-violence attitudes. We similarly use two items adopted from the WVS, asking how justified it is for a man to beat his wife and to use violence against other people (α = 0.78).
Tightness. We use Gelfand's 6-item tightness scale 16 , with items like "There are many social norms that people are supposed to abide by in this country." In the original study, responses were standardized by subtracting participants' mean response to all items in the survey, which was strongly dominated by items on the appropriateness of various behaviors in various contexts. Following this procedure, we adjusted the responses to the tightness items in our survey by subtracting participants' mean response to all appropriateness items. Country-level internal consistency was good (α = 0.80).
Perceived threat. To measure perceived threat we included a question original to this study: "Which of the following do you think are threats to your society (tick any that apply): conflict within the country, conflict with other countries, immigration, over-population, food deprivation, lack of safe water, poor quality of air, natural disasters, diseases?". A tick for a given threat was coded as 1, no tick as zero. Country-means had good internal consistency (α = 0.89).
Attention and comprehension. Measures of attention and comprehension were included at the end of the survey. The attention test asked the participant to tick the fourth box out of five. The comprehension test asked the participant how easy or difficult it had been to understand the questions in the survey, on a five-point scale from very difficult to very easy. In the robustness check reported in the main text we excluded participants who had not answered one or both of these questions (21.1%), or ticked the wrong box in the attention test (an additional 1.0%), or answered that it was very difficult to understand the survey (an additional 0.4%).
Changes to the preregistered analyses. The present paper presents the preregistered analyses with the following three changes.
Exclusions. No exclusions were planned, but as the study was meant to target adults, we decided to exclude respondents who stated an age below 18 years.
Measures of indulgence, power distance, and individualism. Because these scales turned out to lack adequate reliability, we instead decided to use the official Hofstede Insights country scores (obtained from www.hofstede-insights.com/ product/compare-countries/) for these cultural dimensions. Although still widely used in research, a drawback is that these country scores typically build on data collected long ago, especially for power distance and individualism, and may not reflect recent cultural changes 38 .
The use of informal sanctions. To measure the use of informal sanctions, we decided to focus on participants' reports of own use of sanctions and disregard their perceptions of how often they were sanctioned by others, as it is unlikely that people have accurate perceptions of how much others avoid them or gossip about them.
Unregistered analyses. The main text describes some elements that were not preregistered: inclusion of pathogen prevalence, median income, and the national level of gender equality as correlates in Hypothesis 5; calculation of metanorms for physical confrontation; robustness of metanorms across different cities and across student and non-students; internal consistency of a metanorm across scenarios; robustness of correlations with other variables whether metanorms are estimated in the domain of non-cooperation or the domain of out-of-place behaviors. As an additional unregistered analysis, metanorms for informal sanctions were standardized by the metanorm for non-action. Specifically, subtraction of the metanorm for non-action from the metanorms for sanctions was carried out to yield a measure of how appropriate the sanction is perceived to be relative to doing nothing at all. This method has the drawback that ratings for non-action exhibit meaningful country variation (as seen in Table 2), which will be incorporated in the measures for every sanction, thereby making them artificially more closely intercorrelated. Nonetheless, the pattern of results for how metanorms vary across cultures remains qualitatively the same (see Supplementary Table 8).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All data and materials are available at OSF (https://osf.io/pm5kc/), including the raw data underlying Figs. 1-3 and SPSS syntax for analyses. A reporting summary for this Article is available as a Supplementary Information file. Source data are provided with this paper.