Investigating the role of group-based morality in extreme behavioral expressions of prejudice

Understanding motivations underlying acts of hatred are essential for developing strategies to prevent such extreme behavioral expressions of prejudice (EBEPs) against marginalized groups. In this work, we investigate the motivations underlying EBEPs as a function of moral values. Specifically, we propose EBEPs may often be best understood as morally motivated behaviors grounded in people’s moral values and perceptions of moral violations. As evidence, we report five studies that integrate spatial modeling and experimental methods to investigate the relationship between moral values and EBEPs. Our results, from these U.S. based studies, suggest that moral values oriented around group preservation are predictive of the county-level prevalence of hate groups and associated with the belief that extreme behavioral expressions of prejudice against marginalized groups are justified. Additional analyses suggest that the association between group-based moral values and EBEPs against outgroups can be partly explained by the belief that these groups have done something morally wrong.

[REDACTED] In Study 2, the authors use county-level estimates of MF endorsement to predict the prevalence of hate groups. The model is rather sparse though, controlling only for education, poverty, and size of the white population. It seems that an alternative story is that it is simply more conservative counties that have more hate groups. The authors obviously have an indicator of the political orientation of each county, since it's used in the MRP model. Why not use partisanship or ideology as a covariate? Surely other research has been done predicting county-level estimates of hate group prevalence, which could be used to inform this model.
In Study 3, the authors turn to an experiment. The authors claim to be manipulating moral wrongdoing by randomizing whether the immigrants are helping the economy or taking jobs. While this certainly has a moral component, as shown by the data, this is far from a clean manipulation. The manipulation also poses a clear and realistic threat to the economic selfinterest of the local citizens. So while the data is suggestive of the authors' main hypothesis, it seems hardly conclusive. I appreciate that the authors do control for ideology, even though this is only a minor point and it is unclear how ideology was measured. It would have been great to see attention to partisan identity as well, since it is more influential on political attitudes.
In Study 4, the authors use the same design, but focus instead on Mexicans. I'm not sure that this design can tell us much about whether binding values have a unique role in hate more generally, though. The MFQ includes multiple patriotism/nationalism items, so it isn't clear whether the results are unique to binding values or are just telling us that nationalists are more likely to be anti-immigrant. Study 5 shifts the focus to Muslims, but I'm not sure this resolves the issue.
In general, the authors don't really engage with any alternative hypotheses, which would have made the results much more compelling.

Non anonymous review by Jon Haidt
This is a very impressive paper. The methods are certainly impressive, although I do not understand the advanced techniques well enough to be able to pass judgment on them; I assume the editors have found at least one reviewer who is an expert in this area.
But I can comment on the central question -the role of moral motivations in acts of hate -and its links to Moral Foundations Theory. The idea that the worst human actions--from atrocities at a mass scale down to a man beating his girlfriend for suspected infidelitycannot be understood without taking seriously the moral motives of the perpetrator-is one of the most important ideas in the study of violence and hatred. It has been raised many times, and the authors offer a good list of citations for it, mostly from social and cultural psychology. In point #2 below I have some suggestions for improving the literature review. What the present study adds is a specific operationalization of moral motives -the "binding" foundations from MFT, which can be taken prima facie as expressions of moral values and motives -along with very clever methods of testing whether those moral foundations add to our explanatory ability, either of violent/hateful language, hate groups, or endorsement of hatred and violence in experiments they conducted themselves. MFT has been used often to study prejudice, but to my knowledge, MFT has not previously been used to study actual violence, hate groups, or endorsement of violence. I am particularly grateful that the authors show, in several studies, that the binding foundations add to prediction over and above other variables known to be relevant, such as political ideology (liberal to conservative). I was also just really impressed by the size and ambition of these four studies, and I found the overall psychological story being told consistently across the four studies to be very plausible.
I have three major suggestions for improvement: 1) The authors frame this paper as being important because of the rising trend of hate crimes and intergroup identity-based violence. I believe this frame should be dropped, for two reasons: 1) The overall trend over many decades is a massive drop in violence of almost all kinds, in the USA and in the world, along with a massive and steady drop in hostility and prejudice based on identity, at least within the USA. The authors should cite Steve Pinker's claims, at very least, and acknowledge that the overall trends are down, and down massively. Recent Pew data shows a continued rise in average tolerance even during the Trump years. 2) The authors discuss hate crime stats, but these are notoriously unreliable. Greg Lukianoff and I discovered this when writing chapter 6 of The Coddling of the American Mind. In particular, the SPLC has been exposed as a fraud, a money-making organization that does whatever it takes to perpetuate the narrative that hate is constantly on the rise. See this profile in the New Yorker: https://www.newyorker.com/news/news-desk/the-reckoning-of-morris-dees-and-thesouthern-poverty-law-center its lists and numbers are unreliable. The authors have relied on the SPLC database for study 2; that can't be helped. But the authors should at least not refer to any claims by the SPLC in their opening framing, about how hate crimes are on the rise. Greg and I concluded that hate crimes are probably rising since 2016, but that's mostly non-violent crimes, including vandalism, and the rise is not as dramatic as is often claimed. I think that only the FBI crime victimization reports are really reliable. Everything else suffers from reporting biases. So I'd recommend cutting the framing of "hate is rising!" and changing it to "hate is morphing -some forms are down, long term, but some are up, [REDACTED]." 2) The authors have a good paragraph listing major citations where previous scholars have written about the moral motivations of violence. I have two other books that should be added: Baumeister (1997) Evil. Stenner (2005) the authoritarian dynamic.
In addition, I think the paper would be much more vivid for readers if the authors could draw out one or two examples from the authors in the long list of citations. It will be so difficult for readers to grasp, intuitively, that Nazis, skinheads, and white supremacists have an intense moral worldview; readers will want to dismiss them as just overflowing with blind hatred. But the power of social psychology, like anthropology, should be to make difference intelligible. A single paragraph that began something like "For example, Fiske and Rai (2015) give the example of X, showing that the perpetrators of atrocity Y perceive that Z…" [note that it's Fiske and Rai only; Pinker just wrote a forward]. Or "As Karen Stenner (2005) showed, authoritarians are not responding to threats to themselves or their families; they are rather responding to threats to the social order….." or something more vivid than that. My point is that the paper begins with a very short and concise and abstract statement about moral motivations, then uses very advanced methods to test a complicated hypothesis. I think it is vital that the authors spend a little more time -I'd recommend two full paragraphs -helping the reader to understand the phenomenon. This is especially important so that readers can see the binding foundations as being aimed at some sort of higher good, beyond self interest, beyond blind hatred of difference.
3) This article looks only at right wing hate. As the authors acknowledge near the end, page 27, there is left-wing hate too. I think the authors should at least cite and discuss work by Jarret Crawford showing that left and right are, on some measures, equally prejudiced, its just that most of our research looks at prejudice against groups seen to be on the left, or that are favored by the left. But if you look at groups disliked by the left (such as Christians or members of the military) you find a lot of open prejudice too. I don't doubt that hatred rising to levels of violence is more common on the right, but any conservative reader will be familiar with recent cases of left wing violence (such as Antifa, and the guy who shot up the Republican softball team, including Rep. Steve Scalise). This point should be acknowledged in the intro, and perhaps returned to at the end with the suggestion that future research should examine hateful and violent rhetoric on the left.
--p 15, I'm puzzled by the graph suggesting that counties with more college grads have more hate groups. Did I read that right? Is it worth explaining why? --on p. 19, I found myself wishing I could see a table of means. The Means are clearly very low, generally near the floor, given how nasty Dave's behavior is, and how little justification there is for it (even in the justified condition). Once again, as an intuitionist, I want to have a better feel for what's going on in the study; I don't want to just see Betas and odds ratios.

Reviewer #3 (Remarks to the Author):
This is an interesting paper that has significant strengths. The topic is important and timely. The samples and methods are impressive. Much of the paper is nicely written and many of the analyses are elegant. Despite these positive aspects, there are some problems that the authors must consider.
[REDACTED]. The second study attempts to link questionnaire responses from a website created by the founders of Morals Foundation Theory (MFT) with the presence of hate groups based on their presumed locations from the Southern Poverty Law Center. The last three studies are three separate MTurk studies that pose a hypothetical question to participants about their condoning the action of someone in a small town where an out-group is threatening to take away jobs from the community.
[REDACTED]. Study 2 is very difficult to interpret because of the quality of the two sources of data. And the third set of studies are simple but don't directly address some of the main claims of the project. [REDACTED] [REDACTED] [REDACTED] Study 2 is a nice idea in that it attempts to show a similar pattern in the geographic distribution of hate groups and separate surveys by people living in the general area of the hate groups. Each of the two data sets pose some problems. The hate group data comes from the Southern Poverty Law Center (SPLC) that for the last several years has been building a data bank of over 200 groups that they view as extremist. The overwhelming majority of the groups are right wing with clear agendas that are anti-Black, -Muslim, -Jewish, -LGBTQ+, etc. Their locations are defined where their main offices are -many of which are in larger cities (especially Washington) or in the deep south. The MFT surveys of over 100,000 people are not described so there is no sense of the demographics of the people who completed them including age, sex, percent from the deep south, etc.
One issue is that presenting data at the county level is misleading. A quick look at the observed and predicted results suggests that most of the hate groups are in southern California, Arizona, and Nevada. But these represent a very small number of counties. There are mathematically far more blue counties in the deep South -but the counties are quite small. Are the findings attributable to the fact that most binding values and most hate groups come from the deep south? Although not surprising, it would be good to know. Note that this wouldn't nullify the findings but would cast a different light on what is happening.
Studies 3-5 are quite clean and ask participants to read a very brief snippet about a mythical town in the Midwest where a large group of outsiders (Study 3 = mythical Sandarians, Study 4 = Mexicans, Study 5 = Muslims) move to town and, in the high threat condition, were described as undermining the local economy and, thus, harming "native" citizens. Participant rated if the action was morally wrong and then, in a later question, were asked to evaluate the acceptability of a hypothetical citizen, "Dave", who passed out leaflets, yelled at, or physically harmed out-group members.
It's a nice series of studies and it would be great to see the actual means of the responses to these questions across all three studies in a single table. The one question is why are these responses threats to binding values? Doesn't this also violate a sense of fairness? How would the participants interpret the situation?

Summary
The idea of the project is quite interesting. With a major revision it could be publishable. But a revision should work to make the theory and its components more clear.
[REDACTED] Finally, the last series of studies need to link more closely to the binding and internalizing values that are described earlier in the paper.
One final issue. Must of the recent literature on hate groups, including this paper, appears to be guided by an implicit assumption that hateful speech and acts are only perpetuated by right wing groups, ignoring that many of the ideas are true for all humans no matter what their political leanings might be. Consider today's eco-terrorists, Black nationalists, several virulent anti-Trump groups, and others. Just reading the editorials and online letters and comments of the New York Times and the Wall Street Journal demonstrates the hate language that both sides are currently spewing about each other, accusing the opposition as causing harm, cheating, betrayal, subversion, and degradation (all moral vices, I believe). It would be refreshing to see a more even-handed approach to the study of hate.
Reviewer #4 (Remarks to the Author): This MS reports two observational and three questionnaire studies investigating the relationship between binding moral values (from Moral Foundations Theory) and hate speech/acts towards minority groups. In general, I think the MS is quite convincing and admirably combines multiple methods to make a compelling case. I do have some suggestions for improvement, which should be construed as ways to make a great paper even better.
1. The introduction argues that hate activity is on the rise. That might be true in the local sense (i.e. over the past 10 years) but in the larger sense doesn't seem right (e.g., compare to the 1960s in the US where the Klan was murdering civil rights activists). I don't think it takes away from the motivation of the paper at all to put this more accurately given the historical context (e.g., hate activity is still lower now than at many points in the past, but there seems to have been a recent uptick).
2. I understand why the authors use it, but I find the "EBEP" acronym to be sort of clunky (and I prefer to stay away from acronyms in general). If the authors can find a short but clear phrase to replace it, they should.
3. Conceptually, do the authors think of EBEPs as on a continuum with less extreme forms of prejudice, or as categorically different? If it's the former, should the psychology they describe apply to less extreme prejudice as well? One interesting wrinkle is that due to shifting norms, what might have been "normal" prejudice 50 years ago would probably be considered hate speech now (e.g., explicit statements about the inferiority of some groups).
4. In Study 2, I am not sure (because I'm not familiar with the MrsP method) whether the analyses inherently control for county-level ideology. If they do not, I am curious to what extent county-level Binding values are associated with hate group prevalence when accounting for the partisan lean of the county. 5. In Studies 4 and 5, I would like to see descriptives (means and SDs) for the prejudice items. From Figure 5, I would guess that the mean levels of endorsement are low. I don't think this undermines the authors' theoretical argument, but if it is the case it should be obvious to readers and they authors might consider discussing it.
6. There seems to be a typo in the last paragraph on page 25; the sentence "the degree to which participants thought it was morally wrong for Muslims to 'spread Islamic values' was not positively associated with EBEP justification," should (I think) read "was not ONLY positively associated." 7. In the intro/discussion, the authors talk about hate crimes, but most of their data don't speak to criminal behavior, at least not in the US. I think only the "physical assault" behavior would fall into that category, and here the authors look at ratings of justification (as opposed to, say, willingness/likelihood of engaging in it), and those ratings are low (median rating of "not at all justified") in Study 3. The authors may want to revise their language and/or note this limitation explicitly.
Reviewer #5 (Remarks to the Author): I was asked to take a closer look at the spatial analysis which I am happy to do. I am no subject expert on personal values and how this may help to explain aggregate level behavioral outcomes. My expertise lies more in data analysis. I focus in my review specifically on study 2 which is based on data from an opt-in online tool.
-Biases in the measurement due to self-selection of survey participants may be corrected by relying on MrP or MrsP as shown by Wang et al. (2015) with their Xbox study. But there is no reason to believe that this is the silver bullet and we can just use any kind of data. Each dataset will require us to check anew whether the results may be representative or not. Hence, the authors need to show here that they can e.g. do a very good job in predicting democratic vote share (without using that variable as a context-level variable in the response model). If they can show that using MrP/MrsP to estimate county vote return and this is fairly accurate, then we have a reason to believe that the self-selection can be corrected for by using MrP/MrsP.
-I was wondering whether the authors used the simple synthetic or the adjusted synthetic approach by Leeman & Wasserfallen (2017). If they used the adjusted we would want to know what the basis for it was. It should not be the moral data itself but ideally stem from a sample that is closer to the random selection ideal (GSS or so).
-The authors then proceed to show that the number of hate groups per 10,000 can be explained by the local value structure. Have the authors used fixed effects and only exploit over time variation or are they rather regressing average levels on other average levels? I assume it is the latter and would like to see whether these results hold once the authors use state fixed effects and some form of controls for how rural a country is.
-I am just not sure how robust these findings are. My understanding is that the authors want to demonstrate that the aggregate outcome of hate group existence is partly explained by a specific strength of some values. I understand this to mean that average local value distribution explains the outcome of hate group existence. But to show this, we would want to see that either all people moving into these regions become more racists or that some people become radicalized and much more racists. That would constitute strong evidence for such a context-effect.
In this new study, we take particular care to manipulate moral threat in a way that does not pose a clear and realistic threat to the economic self-interest of the local citizens.
Specifically, we randomly assigned participants to either a control condition, a Binding values violation condition, or a Individualizing values violation. In each condition, participants read about a community, the People of the Earth, engaging in one of three behaviors: growing produce in a community garden (control); engaging in sex acts involving feces and bodily fluids (Binding values); or raising pet dogs and cats for meat (Individualizing values). Participants then indicated how morally wrong the community's behavior was and the degree to which acts of hate targeting a member of the community were justified.
Using this design, we now provide a more conclusive and direct test of our main hypothesis.

I appreciate that the authors do control for ideology, even though this is only a minor point and it is unclear how ideology was measured.
Author's Response: Thank you for noting this. We should have been more clear regarding how we measured ideology and we have revised our manuscript to fix this. Specifically, we measured ideology using a 7-point Liberal-to-Conservative identification item.
9. It would have been great to see attention to partisan identity as well, since it is more influential on political attitudes.
Author's Response: Thank you for this suggestion. We focus on ideology rather than partisan identity because political ideology is a more prominent predictor of moral judgments in the field of moral psychology. For example, most research on Moral Foundations focuses on ideology, rather than partisan identity. Further, we have no reason to expect that adjusting for partisan identity would constitute a meaningfully different adjustment with regard to our parameter of interest. Accordingly, in line with previous work, we chose to focus on ideological identification rather than partisan identity. Nonetheless, we agree that this issue raises problems for our current studies. Our work maintains that the Moralized Threat Model is applicable across social contexts, so we should be able to show that the association between Binding values and hate is robust to contextual variation.

In Study 4, the authors use the same design, but focus instead on
Importantly, we demonstrate exactly this in our new experimental study. In this study, we explicitly focus on hypothetical deviant communities that are not immigrant groups or groups and that are not characterized by any clear ideological associations.
11. In general, the authors don't really engage with any alternative hypotheses, which would have made the results much more compelling.
Author's Response: Thank you for this suggestion. We agree that engaging with alternative hypotheses is a powerful strategy for compelling empirical work. We now focus more comprehensively on the alternative hypothesis that political ideology might explain the association we observe between Binding values and acts of hate. In our revision, we adjust for ideology in our analysis of county-level hate groups. And, in our new study, we focus on manipulations that are not characterized by any clear ideological associations. We also now directly investigate how the implications of the Moralized Threat Hypothesis vary depending on moral domain. Specifically, we investigate whether the Moralized Threat Hypothesis generalizes equally to other moral domains (e.g. Individualizing values) or whether its implied effects are stronger in the domain of Binding values.
That said, we pair experimental and observational survey studies with analyses of real-world outcomes relevant to our hypothesis. In our view, the multi-methodological and interdisciplinary collection of studies we report in this work constitute a sufficiently compelling body of evidence. For instance, we would be less compelled by a series of survey studies that show the same effects and engage with a wider range of alternative hypotheses.

Reviewer #2 (Remarks to the Author): Non anonymous review by Jon Haidt
1. This is a very impressive paper. The methods are certainly impressive, although I do not understand the advanced techniques well enough to be able to pass judgment on them; I assume the editors have found at least one reviewer who is an expert in this area.
Author's Response: Thank you for your kind words. We sought to make this paper as methodologically robust as possible and we are grateful to see this work acknowledged.

But I can comment on the central question -the role of moral motivations in acts of hate -and its links to Moral Foundations Theory. The idea that the worst human actions--from atrocities at a mass scale down to a man beating his girlfriend for suspected infidelitycannot be understood without taking seriously the moral motives of the perpetrator-is one of the most important ideas in the study of violence and hatred. It has been raised many times, and the authors offer a good list of citations for it, mostly from social and cultural psychology. In point #2 below I have some suggestions for improving the literature review.
What the present study adds is a specific operationalization of moral motives -the "binding" foundations from MFT, which can be taken prima facie as expressions of moral values and motives -along with very clever methods of testing whether those moral foundations add to our explanatory ability, either of violent/hateful language, hate groups, or endorsement of hatred and violence in experiments they conducted themselves. MFT has been used often to study prejudice, but to my knowledge, MFT has not previously been used to study actual violence, hate groups, or endorsement of violence. I am particularly grateful that the authors show, in several studies, that the binding foundations add to prediction over and above other variables known to be relevant, such as political ideology (liberal to conservative). I was also just really impressed by the size and ambition of these four studies, and I found the overall psychological story being told consistently across the four studies to be very plausible.
Author's Response: Thank you for this careful review of the theoretical underpinnings of our work. We fully agree with your assessment of our research goals and we are glad that you find our literature review and research approach are sufficient and compelling.

The authors frame this paper as being important because of the rising trend of hate crimes and intergroup identity-based violence. I believe this frame should be dropped, for two reasons:
i. The overall trend over many decades is a massive drop in violence of almost all kinds, in the USA and in the world, along with a massive and steady drop in hostility and prejudice based on identity, at least within the USA. The authors should cite Steve Pinker's claims, at very least, and acknowledge that the overall trends are down, and down massively. Recent Pew data shows a continued rise in average tolerance even during the Trump years.
Author's Response: Thank you for raising this issue. We agree that it is very important to contextualize recent spikes in hate crimes and hate group activity within the larger negative trend that you describe. We now cite Pinker (2012) in our introduction and explicitly note the arguments for a global decline in prejudice and violence.
ii. The authors discuss hate crime stats, but these are notoriously unreliable. Greg Lukianoff and I discovered this when writing chapter 6 of The Coddling of the American Mind. In particular, the SPLC has been exposed as a fraud, a money-making organization that does whatever it takes to perpetuate the narrative that hate is constantly on the rise. See this profile in the New Yorker: https://www.newyorker.com/news/news-desk/thereckoning-of-morris-dees-and-the-southern-poverty-law-center its lists and numbers are unreliable. The authors have relied on the SPLC database for study 2; that can't be helped. But the authors should at least not refer to any claims by the SPLC in their opening framing, about how hate crimes are on the rise. Greg and I concluded that hate crimes are probably rising since 2016, but that's mostly non-violent crimes, including vandalism, and the rise is not as dramatic as is often claimed. I think that only the FBI crime victimization reports are really reliable. Everything else suffers from reporting biases. So I'd recommend cutting the framing of "hate is rising!" and changing it to "hate is morphing -some forms are down, long term, but some are up, [REDACTED]." Author's Response: Thank you for raising these excellent points and taking the time to make these suggestions for improvement.
First, we would like to clarify that we cannot find where we cite SPLC claims about hate crime rates. As evidence for recent increases in hate crime, we cite the New York Times, which report trends from the FBI Uniform Crime Reports data and Levin & Reitzel (2018), which conduct a city-level analysis in order to control for issues with reporting biases. However, we do cite the SPLC Hate Group list. While we acknowledge the issues you raise regarding this list, we feel the SPLC data offers the best opportunity to test our hypotheses to date. Accordingly, we feel that it is relevant and necessary to mention this data in our introduction.
We decided to focus on the county-level distribution of SPLC-identified hate groups because it is impossible to reliably estimate the county-level distribution of hate crimes with available data. As you note, estimating hate crime rates is very difficult. Currently, the FBI collects two types of hate crime data via the Uniform Crime Reports (UCR) and the FBI's National Crime Victimization Survey (NCVS).
The UCR is an FBI reporting program that relies on (mostly) voluntary reports from individual policing agencies. While UCR data is widely used to estimate sub-national crime rates, the possibility of differences in reporting standards across agencies raise very serious issues for estimates of sub-national hate crime rates. This issue is particularly relevant for our work: we would expect policing agencies to place less focus on hate crimes (e.g. less training, less reporting) in areas where hate crimes might be seen as more morally acceptable. This kind of regional reporting bias could completely mask a very real association between Binding values and hate crimes. In addition to the issue of agency reporting biases, there is also the issue of victim reporting biases. Research suggests that the majority of bias-motivated crimes are not reported to police (Wilson, 2014; Pezzella, Fetzer, Keller, 2019). Further it seems reasonable to expect that the rate of non-reporting might vary geographically according to factors such as police receptivity to hate crime reporting and local attitudes toward the victim's targeted identity group. Again, this kind of reporting bias could completely mask a real association between Binding values and hate crime.
In contrast to the UCR, the NCVS data is collected via a random, nationally representative sample. Accordingly, it does not suffer from the same kinds of severe reporting biases that characterize the UCR. However, access to geolocated NCVS data is highly restricted and requires an extremely rigorous, multiyear review process. Indeed, while there is some work focused on small-area crime estimation using the NCVS, we are not aware of any work in the domain of hate crime. Accordingly, while an unbiased estimate of county-level hate crime rates could conceivably be estimated from the NCVS, taking this approach was not feasible.
For these reasons, we decided to focus on the county-level distribution of hate groups. As you note, there are many problems with the SPLCs list of hate groups; however, because the SPLC does not reply on agency level reports, there is no reason to expect the kind of geographic reporting biases that characterize the UCR data.
That said, if the SPLC has an extreme liberal bias, one might expect that the locations of SPLC-identified hate groups might be correlated with regional political ideology. That is, if the SPLC is primarily oriented around the task of demonizing conservative groups in order to raise funding, one might reasonably expect the organizations they label as hate groups to be spatially correlated with conservativeness.
We agree that this is certainly an issue for our analysis and we now include county-level 2016 Democratic Presidential vote share and county-level rural vs. urban status.
Importantly, after accounting for rural status and county-level partisanship, we still observe a robust association between county-level Binding values and the county-level rate of hate groups.
Of course, this does not address the issues of whether the SPLC is fraudulent or their list of hate groups unreliable. Regarding that issue, we fully agree that there are likely serious issues with the SPLC's reporting methods. However, to our knowledge, there is not consensus on the view that the SPLC is a fraud that "does whatever it takes to perpetuate the narrative that hate is constantly on the rise." For example, see this Washington Post article (https://www.washingtonpost.com/news/magazine/wp/2018/11/08/feature/is-the-southern-povertylaw-center-judging-hate-fairly/).
But, perhaps more importantly, if the SPLC list of hate groups was extremely unreliable, what would this mean for our analysis? One reasonable expectation might be that we would observe no association between Binding values and the rate of hate groups, especially after controlling for what seem to be likely confounds (e.g. variables that a liberal-biased organization might condition on when identifying groups to target), such as rural status and county-level vote share. However, even after controlling for these factors, we still observe the hypothesized association.
Finally, while we acknowledge the many measurement issues with UCR hate crime data, it is also worth noting that recent research has found positive associations between regional variations in UCR reported hate crime and the geographic distribution SPLC-identified hate groups (Jendryke & McClure, 2019).
Similarly, research has also found positive geographic associations between far-right ideologically motivated homicides recorded in the U.S. Extremist Crime Database and the distribution of SPLCidentified hate groups (Adamczyk, Gruenwald, Chermak, & Freilich, 2014). Notably, these are exactly the kinds of associations one would expect if the SPLC data was at least moderately reliable.
Of course, there are many other potential issues that could be raised with regard to the SPLC data and our reliance on it to test our hypothesis. This is exactly why we paired this study with another observational study that is free from these kinds of reporting biases (Study 1) and a series of experimental and observational survey studies. While the association we observe in Study 2 could be spuriously caused by unreliable reporting from the SPLC, we think that this is unlikely, given that we find comparable effects across a range of other methodologies and data sources.
Finally, regarding the question of whether hate crime has increased in recent years, there seems to be little evidence to the contrary. As evidence for this claim, we cited multiple sources, none of which were the SPLC. However, to further bolster this claim, we now also cite Levin and Reitzel (2018), which found a 12.5% increase in hate crime in 2017, despite a small decrease in crime not motivated by bias. Finally, we would hesitate to dismiss non-violent hate crimes when considering the overall trend of hate crime. While they are certainly less extreme than violent hate crimes, they can have substantive negative effects on their victims and victims' communities.
That said, again, we want to acknowledge that we share your concern about the validity of the SPLC hate group list. Given recent news coverage and criticism of the SPLC, we agree that it is likely that the SPLC list is biased and perhaps contaminated with a higher false positive rate than would be ideal for our analysis. However, in the context of our entire program of research, we believe that this study still provides useful convergent evidence for our hypotheses.
Also, you make a very good point that the recent increase in hate crimes must be contextualized within the larger negative trend of global violence. Accordingly, we now note this larger negative trend in our introduction.
Adamczyk Author's Response: Thank you for suggesting these citations, we have added them to our manuscript.
5. In addition, I think the paper would be much more vivid for readers if the authors could draw out one or two examples from the authors in the long list of citations. It will be so difficult for readers to grasp, intuitively, that Nazis, skinheads, and white supremacists have an intense moral worldview; readers will want to dismiss them as just overflowing with blind hatred. But the power of social psychology, like anthropology, should be to make difference intelligible. A single paragraph that began something like "For example, Fiske and Rai (2015) give the example of X, showing that the perpetrators of atrocity Y perceive that Z..." [note that it's Fiske and Rai only; Pinker just wrote a forward]. Or "As Karen Stenner (2005) showed, authoritarians are not responding to threats to themselves or their families; they are rather responding to threats to the social order....." or something more vivid than that. My point is that the paper begins with a very short and concise and abstract statement about moral motivations, then uses very advanced methods to test a complicated hypothesis. I think it is vital that the authors spend a little more time -I'd recommend two full paragraphs -helping the reader to understand the phenomenon. This is especially important so that readers can see the binding foundations as being aimed at some sort of higher good, beyond self interest, beyond blind hatred of difference. Author Response: Thank you for taking the time to note these details. We now include demographic information about the MFT respondents.

Study 2 is a nice idea in that it attempts to show a similar pattern in the geographic
8. One issue is that presenting data at the county level is misleading. A quick look at the observed and predicted results suggests that most of the hate groups are in southern California, Arizona, and Nevada. But these represent a very small number of counties. There are mathematically far more blue counties in the deep South -but the counties are quite small. Are the findings attributable to the fact that most binding values and most hate groups come from the deep south? Although not surprising, it would be good to know. Note that this wouldn't nullify the findings but would cast a different light on what is happening.
Author Response: Thank you for raising these questions. Unfortunately, we do not quite follow this argument. Further, it is not clear to us how our county-level analysis could be "misleading." A state-level analysis would have obscured substantial sub-state variation and, we would argue, ultimately would have provided a less informative perspective on the association between Binding values and hate groups.
Further, there are clusters of hate groups in both the Deep South and in the Southwest, as well as in New England. Given that we originally adjusted for the most likely confounds (education, poverty, and ethnic composition) and new include additional adjustments for county-level ideology and rural vs. urban, it is not quite clear to us how we should interpret geographic covariation between Binding values and hate groups if not as evidence for an association.
Finally, during our revision process, we did estimate additional models that adjusted for the expected state-level rate of hate groups. In these models, the coefficient for Binding values reflected the effect on hate group rate of Binding values after adjusting the average rate of hate groups at the state level.

Studies 3-5 are quite clean and ask participants to read a very brief snippet about a mythical town in the Midwest where a large group of outsiders (Study 3 = mythical
Sandarians, Study 4 = Mexicans, Study 5 = Muslims) move to town and, in the high threat condition, were described as undermining the local economy and, thus, harming "native" citizens. Participant rated if the action was morally wrong and then, in a later question, were asked to evaluate the acceptability of a hypothetical citizen, "Dave", who passed out leaflets, yelled at, or physically harmed out-group members.
Author Response: Thank you for taking the time to note these details.

It's a nice series of studies and it would be great to see the actual means of the responses to these questions across all three studies in a single table.
Author Response: Thank you for this request. We now describe the distributions (means, SDs, and medians) for each EBEP item in each study in the main manuscript. As the distributions are quite consistent across studies and our SM is already very long, we declined to add an additional table to SM.

The one question is why are these responses threats to binding values? Doesn't this also violate a sense of fairness? How would the participants interpret the situation?
Author Response: Thank you for this question. In general, the Binding values are oriented around threats to group cohesion, security, and traditions. In studies 3-5, we focus on two general out-group behaviors: undocumented immigrants -specifically, a fictional immigrant group (Study 3) and Mexicans (Study 4)taking jobs and Muslim immigrants spreading "Islamic values" (Study 5).
First, we would like to note that we fully agree that undocumented immigrants taking jobs can be understood as an Individualizing (e.g. fairness) violation. Indeed, this is why we also focused on Muslims spreading "Islamic values," as this is difficult to understand as anything other than a threat to ingroup norms and traditions.
Nonetheless, in our view, it would be reductive to construe undocumented immigrants taking jobs as merely a fairness violation because it also involves violations of group norms around hierarchy and order. Consistent with this, in study 4 we find that even after controlling for individual-level political ideology, people with stronger Binding values are more likely to believe that it is morally wrong for undocumented Mexicans to take jobs, compared to people with weaker Binding values. Further, no such positive association is found for Individualizing values. This suggests that the behavior is more morally triggering for people who prioritize Binding values (and not those who prioritize Fairness values).
Finally, to more thoroughly address this issue, we conducted a new experimental study (Study 6 in our revised manuscript). In this study, we directly target Binding values through an experimental manipulation that depicts the social outgroup as engaging in taboo sexual rituals. Importantly, we show that participant's Binding values moderates the effect of experimental condition, such that people low on Binding values were, on average, far less likely to think the outgroup's behavior was immoral and that acts of hate against the group were justified. In other words, in our new study, we show that Binding values function like a treatment susceptibility factor that moderates people's responses to Binding values violations.
12. Summary a. The idea of the project is quite interesting. With a major revision it could be publishable. But a revision should work to make the theory and its components more clear.
[REDACTED]. Finally, the last series of studies need to link more closely to the binding and internalizing values that are described earlier in the paper.
Author Response: Thank you for highlighting these issues. [REDACTED]. Regarding the survey studies, we have clarified our discussion of Moral Foundations Theory throughout the manuscript in order to more clearly link these studies to the Individualizing and Binding values. We have also added an additional study that offers a more direct test of the Moralized Threat Hypothesis. In this study, we manipulate both Binding and Individualizing values and find evidence for the following hypotheses: 1. experimentally manipulated Binding values violations cause increased beliefs in the justification of hate acts; 2. the degree to which the violations are seen as morally wrong mediates this association; 3. Binding values moderates the mediation effect.

b. One final issue. Must of the recent literature on hate groups, including this paper, appears to be guided by an implicit assumption that hateful speech and acts are only perpetuated by right wing groups, ignoring that many of the ideas are true for all humans no matter what their political leanings might be. Consider today's ecoterrorists, Black nationalists, several virulent anti-Trump groups, and others. Just reading the editorials and online letters and comments of the New York Times and the Wall Street Journal demonstrates the hate language that both sides are currently spewing about each other, accusing the opposition as causing harm, cheating, betrayal, subversion, and degradation (all moral vices, I believe). It would be refreshing to see a more even-handed approach to the study of hate.
Author Response: Thank you for raising this point. Several readers have raised the point that our manuscript focuses on acts of hate perpetrated by right-wing groups and, consequently, neglects acts of hate perpetrated by left-wing groups. First, we would like to emphasize that the Moralized Threat Hypothesis is consistent with the belief that any group of any political ideology can engage in acts of hate under the right conditions. And we fully recognize that left-wing hatred also occupies an increasingly prominent space in American society.
Nonetheless, it is also the case that in our current socio-cultural moment, more hate crimes and terrorist attacks are perpetrated by people who subscribe to right-wing ideologies than left-wing ideologies and far more hate groups espouse right-wing ideologies than left-wing ideologies. That is, there is no left-wing equivalents of the Klu Klux Klan, Aryan Brotherhood, or American Nazi Party. In our view, this does not mean that left-wing hate is not a current issue, but it does suggest that right-wing hate may be more a widespread issue. And one consequence of this is that it is easier to study right-wing hate because it is better documented and easier to access.
That said, the reviewer's point is well taken. As we note above, we now offer a more detailed discussion of how hate can arise on both sides of the aisle.

Reviewer #4 (Remarks to the Author):
This MS reports two observational and three questionnaire studies investigating the relationship between binding moral values (from Moral Foundations Theory) and hate speech/acts towards minority groups. In general, I think the MS is quite convincing and admirably combines multiple methods to make a compelling case. I do have some suggestions for improvement, which should be construed as ways to make a great paper even better.
Author Response: Thank you for your positive words and thoughtful suggestions.

The introduction argues that hate activity is on the rise. That might be true in the local sense (i.e. over the past 10 years) but in the larger sense doesn't seem right (e.g., compare to the 1960s in the US where the Klan was murdering civil rights activists). I don't think it
takes away from the motivation of the paper at all to put this more accurately given the historical context (e.g., hate activity is still lower now than at many points in the past, but there seems to have been a recent uptick).
Author Response: Thank you for this suggestion. We have modified the framing we use in the introduction to more accurately contextualize the recent upward trends in hate acts.

I understand why the authors use it, but I find the "EBEP" acronym to be sort of clunky (and I prefer to stay away from acronyms in general). If the authors can find a short but clear phrase to replace it, they should.
Author Response: Thank you for this suggestion. While we generally share your feelings toward acronyms, we put a fair amount of thought into the label for our construct of interest. Extreme behavioral expressions of prejudice are exactly what we aim to study and explain in this work. We also considered phrases like "acts of hate," but we feel reference to "hate" opens up questions about how to operationalize and classify "hate" and "hatred." As those questions are not central to our focus here, we felt that reference to hatred could distract from our primary aims. Accordingly, in this case we feel that it is better to suffer a little clunkiness in order to prioritize linguistic precision.

Conceptually, do the authors think of EBEPs as on a continuum with less extreme forms of prejudice, or as categorically different? If it's the former, should the psychology they describe apply to less extreme prejudice as well? One interesting wrinkle is that due to
shifting norms, what might have been "normal" prejudice 50 years ago would probably be considered hate speech now (e.g., explicit statements about the inferiority of some groups).
Author Response: This is a very interesting question. Unfortunately, our current program of research does not directly address this question, so we would hesitate to take a strong position on any possible answer. On one hand, it is easy to imagine how hatred might emerge from what is initially just mild dislike for a social group. Further, it seems likely that less extreme forms of prejudice might serve as a foundation for hatred. In this sense, it might be fair to place acts of hate on a continuum of prejudice where "mild" prejudice or dislike constitutes the opposite pole. However, we also suspect that the psychological/neurological profile of prejudice likely changes as one moves across this spectrum. For instance, hatred involves extreme affective responses that are most likely not involved in milder forms of prejudice. So, in this sense, we suspect that there may also be qualitative differences in the psychological/neurological components involved in weaker forms of prejudices vs. extreme behavioral expressions of prejudice.

In Study 2, I am not sure (because I'm not familiar with the MrsP method) whether the analyses inherently control for county-level ideology. If they do not, I am curious to what extent county-level Binding values are associated with hate group prevalence when accounting for the partisan lean of the county.
Author Response: Thank you for this comment. We now adjust for county-level Democratic vote share in our primary analysis. Notably, adjusting for this factor did not substantively change our findings.

In Studies 4 and 5, I would like to see descriptives (means and SDs) for the prejudice items.
From Figure 5, I would guess that the mean levels of endorsement are low. I don't think this undermines the authors' theoretical argument, but if it is the case it should be obvious to readers and they authors might consider discussing it.
Author Response: Thank you for this suggestion. We now include means, SDs, and medians for the EBEP items in all studies.
6. There seems to be a typo in the last paragraph on page 25; the sentence "the degree to which participants thought it was morally wrong for Muslims to 'spread Islamic values' was not positively associated with EBEP justification," should (I think) read "was not ONLY positively associated." Author Response: Thank you for highlighting this error. We have fixed it in our revised manuscript.

7.
In the intro/discussion, the authors talk about hate crimes, but most of their data don't speak to criminal behavior, at least not in the US. I think only the "physical assault" behavior would fall into that category, and here the authors look at ratings of justification (as opposed to, say, willingness/likelihood of engaging in it), and those ratings are low (median rating of "not at all justified") in Study 3. The authors may want to revise their language and/or note this limitation explicitly.
1. I was asked to take a closer look at the spatial analysis which I am happy to do. I am no subject expert on personal values and how this may help to explain aggregate level behavioral outcomes. My expertise lies more in data analysis.
Author Response: Thank you for this clarification and for your thorough review of our paper.

Biases in the measurement due to self-selection of survey participants may be corrected by relying on MrP or MrsP as shown by Wang et al. (2015) with their Xbox study. But there is no reason to believe that this is the silver bullet and we can just use any kind of data. Each dataset will require us to check anew whether the results may be representative or not. Hence, the authors need to show here that they can e.g. do a very good job in predicting democratic vote share (without using that variable as a context-level variable in the response model). If they can show that using MrP/MrsP to estimate county vote return and this is fairly accurate, then we have a reason to believe that the self-selection can be corrected for by using MrP/MrsP.
Author Response: Thank you for this comment. We completely agree that MrP/MrsP is not a silver bullet and that validation is an essential component in any downstream analysis that relies on MrP/MrsP estimates. We should have included a validity check in our initial manuscript and the fact that we didn't was an oversight on our part. So, thank you for bringing this up.
Our revised manuscript now includes results from a validation study in Study 2 Supplemental Material. In this study, we estimate the county-level population proportion that identifies as conservative (i.e. slightly conservative, conservative, or very conservative) by applying MrsP to the Moral Foundations dataset we obtained from YourMorals.org.
While it would have been more conventional, with regard to the MrP literature, to estimate the countylevel Republican or Democratic vote share, this was not an option for us because we do not have sufficient data for respondents' voter intentions or behavior. As such, one drawback to our validity test is that there is an unknown ceiling for the association between the county-level proportion of people who identify as at least slightly conservative and the county-level Republican vote share. That is, we know that there is not a 1:1 relationship between ideological identification and voter behavior, so the association between a "perfect" MrsP estimate of county-level conservativeness would not necessarily be perfectly correlated with county-level Republican vote share.
Nonetheless, left-right orientation is one of the strongest predictors of vote choice (Jou & Dalton, 2017) and the county-level proportion of people who identify as at least slightly conservative should thus be strongly associated with the county-level Republican vote share.
Accordingly, if the MrsP procedure we rely on to estimate county-level moral values sufficiently accounts for response-biases, then we should be able to detect a strong association between estimates of the county-level proportion of self-identified conservatives and the observed county-level Republican vote share.
To conduct our validity test, we estimated the county-level proportion of self-identified conservatives using the MrsP procedure that we use to estimate county-level moral values. However, one key difference is that we did not include any context-level measures of political ideology or vote share in the MrsP response model. More specifically, at the individual-level, the response model adjusted for respondent ethnicity, gender, level of education, age, and frequency of religious attendance (these variables were identical to those we include in the response models used to estimate county-level moral values). At the context-level, we included the county-level rate of protestant evangelicals, which we obtained from the 2010 Religious Census (Grammich, 2012).
Finally, to investigate the degree to which MrsP can correct for self-selection bias in the YourMorals data, we evaluated the association between county-level 2012 and 2016 Presidential election Republican vote shares and our estimates of county-level conservatism.
We That is, RSE is expected to be slightly higher for counties with larger estimated proportions of conservatives and, further, the degree to which the estimated proportion of conservatives systematically underestimates the 2016 Republican vote share (e.g. the magnitude of the negative bias of our estimate) increases slightly as the estimated proportion of conservatives increases. Consistent with these findings, the correlation between the observed errors and errors expected under assumptions of normality were 0.99 and 0.98 for election years 2012 and 2016.
Perhaps this is confusing because our dataset spans several years and this raises the question of why not disaggregate into county x years and conduct our analysis at that level. The problem with this approach is that it would require yearly estimates of county-level moral values, which we do not have. While our YourMorals sample does indeed span 5 years, we use the full sample to estimate time invariant countylevel moral values. If, instead, we tried to estimate county-level moral values for each year, those estimates would be based on approximately 20,000 observations each, assuming a uniform response distribution over years. Given that there are 3000+ counties in the contiguous U.S., this would induce an enormous amount of sparsity and would likely yield estimates that are unreliable and/or too severely smoothed by the MrsP model (i.e. a large portion of counties would have few to none observations).
Accordingly, while it would certainly be valuable to exploit temporal variation in our analysis, such an approach is not feasible because our moral values estimates are time invariant.

I assume it is the latter and would like to see whether these results hold once the authors use state fixed effects and some form of controls for how rural a country is.
Author Response: Thank you for taking the time to think through our analysis and offer these suggestions for improvement. We fully agree that adjusting for rural vs. urban is a good idea and we now include that in our model.
However, we do not follow the rationale for adding state fixed effects to our model. Instead of improving our estimates of the association between county-level Binding values and hate group rates, adding state fixed effects in our model would fundamentally change the question addressed by the model.
Our goal is to estimate the association between county-level Binding values and the county-level rate of hate groups. Under our current model specification, the regression coefficient estimated for Binding values represents this association, assuming all confounding variables are also included in the model.
If we were to add state fixed effects to our model, we would no longer be estimating the association between county-level Binding values and hate group rates. Instead, we would be estimating this association after adjusting for state-level differences. In other words, a model with state fixed effects would ask the question: do county-level Binding values explain variation in the county-level rate of hate groups within a given state.
This model poses several problems. First, it does not address our question of interest. Second, its validity rests on the assumption that state boundaries encode meaningful/relevant hierarchical structure with regard to the county-level rate of hate groups. In other words, the only reason to adjust for state fixed effects is to account for unmeasured confounding factors that operate at the state level. In many research settings, this a reasonable and often necessary approach. However, in our case, it is not at all clear why state-level differences might confound the association between county-level moral values and the county-level rate of hate groups. Without a reasonable argument for mechanism, including state fixed effects would raise the risks of over adjustment and adjustment bias (Schisterman, Cole, & Platt, 2009;Breslow, 1982).
Of course, one could argue that if there are no confounding state-level effects, then estimates of the association between Binding values and hate group rates should be robust to including state fixed effects. However, this is not necessarily true, because, again, adding state fixed effects to our model would fundamentally change the question addressed by the model. With state fixed effects, our analysis would focus on whether within-state variation in county-level Binding values explains within-state variation in the county-level rate of hate groups (i.e. stable differences between counties from different states would be ignored). Because the county-level rate of hate groups is sparsely and non-uniformly distributed across the United States, there are likely a substantial number of states with little within-state variation. Under these conditions, adding state fixed effects will increase the uncertainty (i.e. standard error) around the association between Binding values and hate group rates without adding any theoretical or interpretive value. Further, because Binding-values vary regionally (including at the state-level), adding state-level fixed effects will almost certainly partially mask the association between Binding values and hate-groups.
Finally, it is worth noting that we could not find any precedent for including state fixed effects in an analysis such as ours. For instance, Medina et al. (2018) investigate the county-level distribution of hate groups in the U.S., yet they do not include state fixed effects. Similarly, reviews of geospatial ecological regression -which is the approach we employ in this analysis -do not suggest including upper-level fixed effects (e.g. state fixed effects) when the analytical goal is to estimate a lower-level association (e.g. between counties; Lindgren & Rue, 2015;Lawson, 2013) This all aside, we did also estimate a second model that included the standardized state-level rate of hate groups per 100k inhabitants as an additional adjustment variable. 6. I am just not sure how robust these findings are. My understanding is that the authors want to demonstrate that the aggregate outcome of hate group existence is partly explained by a specific strength of some values. I understand this to mean that average local value distribution explains the outcome of hate group existence. But to show this, we would want to see that either all people moving into these regions become more racists or that some people become radicalized and much more racists. That would constitute strong evidence for such a context-effect.
Author Response: Thank you for this comment! We fully agree that exploiting migration or relying on a number of other econometric approaches could have yielded more robust findings. Unfortunately, we do not have the data necessary for conducting such analyses. Accordingly, we see Study 2 as providing suggestive and consistent evidence regarding our hypotheses. In other words, on its own, we do not see Study 2 as providing sufficient evidence for concluding our hypotheses are supported. Indeed, this is exactly why we paired Study 2 with another observational study that relies on different data and a different level of analysis (Study 1) and with a series of controlled survey studies. Together, these studies provide convergent evidence across different levels of analysis, methods of measurement, samples, and methods of analysis. In our view, this kind of convergent evidence is exactly what is required for a finding to be robust.

Reviewers' comments:
Reviewer #3 (Remarks to the Author): The paper is much more understandable than the initial version although it continues to be a challenging manuscript to appreciate. The improvements from the original draft, however, are extremely impressive. In many ways, I learned more from the reviewer response than from the paper itself. The radically expanded supplemental information was also quite helpful.
The authors have done a much better job in explaining MFT. [REDACTED].
Study 2 continues to be interesting and the authors have helped clarify many of the issues I was concerned about.
Studies 3-5 are solid traditional social psychology studies. Study 6, though, is compelling and fun to read. I can see the authors sitting around a table coming up with the People of the Earth vignettes. I haven't had such a good laugh in a long time. It's also a nice demonstration of very different disturbing moral challenges with similar results.
One final note. Moral Foundations Theory is not an easy framework for many people to appreciate. The foundations themselves sometimes seem arbitrary and the psychometrics are occasionally puzzling. Because the ideas underlying the theory are compelling, MFT continues to inspire many in the field. Right now, MFT is a good story but the science supporting it is still in the early stages. In some ways, the current paper is a good example. I would urge the authors to rethink the introduction a bit [REDACTED].
Reviewer #4 (Remarks to the Author): I was a reviewer on a previous version of this MS and appreciate the changes that the authors have made in response to the last round of comments. I still have a couple concerns that I would like the authors to address: 1. Haidt's review mentions concerns about the SPLC database quality. I think the authors' response is fairly convincing but the concern may occur to other readers as well. I think the authors ought to explain the potential issues and address them, as much as they can, in the text. To be clear, I think this study adds value, and no data are perfect. Nonetheless, the limitations ought to be made clear.
2. I appreciate the addition of Study 6. I'm not convinced, though, that eating pets is the prototypical harm violation. There are definitely purity elements there as well-ironically one of Haidt's first intuitionist morality papers was subtitled "Is it wrong to eat your dog." Of course the current scenarios are different in some ways, but nevertheless some validation that pet-eating is seen as a harm violation and not a purity violation would be helpful. Along those lines, do binding values moderate the mediation between perceived immorality and approval of anti-group behavior for the pet-eating scenario. Unless I missed something, it looks likely the authors only report (non) moderation for individualizing values.
Reviewer #5 (Remarks to the Author): Think the authors do a very good job in responding to my first two questions with respect to MrP.
But I am a little bit taken back by their response to the third point regarding state fixed effects. If there is a relationship how they propose, we would find significant effects even after including fixed effects. If the effects go away, then we learn that states with generally higher values on X also tended to have higher values on Y and that drove the relationship. I cannot see such an analysis being accepted at a top 5 journal in economics nor in political science. Our current standards have changed and we would require a more meticulous analysis of observational data.
The fact that the authors do not show the results with fixed effects let's me wonder whether the results maybe are significant after including fixed effects. Further, separating rural and urban counties would make sense. Currently, it is entirely possible that you could show a relationship between number of harvester p.c. in a county and acts of hate. Nothing is wrong with that partial correlation that you can estimate in a model -it is just far away from being causal and hence worth being explored.
In my reading the observational studies are really important since they inform us whether a mechanism, found in an online experiment, also amounts to tangible effects in the real world.
In that sense, I feel that the observational studies need to be robust while it is clear that they cannot rule out every problem. What the range of problems are that authors should rule out is then a question to which each discipline has a different response which may also change over time.
I read the authors "defense" carefully and as argued above, if all factors varying with state are orthogonal to their measures, we do not need to include fixed effects. But if we do, the results would not change. In all other cases one has to include them (check e.g. Mundlack's work on using means of groups to block bias from the 1970s). I assume these are differences that arise from different disciplinary backgrounds and this makes it ultimately an editorial decision. I can only repeat myself and say that in leading economics, political science, or sociology journal such an analysis would most likely not stand. In sum, our issue with state fixed effects is that they will decrease statistical power and it is not at all clear what unmeasured state-level mechanisms would be the real factors driving the county-level effects 5. In my reading the observational studies are really important since they inform us whether a mechanism, found in an online experiment, also amounts to tangible effects in the real world. In that sense, I feel that the observational studies need to be robust while it is clear that they cannot rule out every problem. What the range of problems are that authors should rule out is then a question to which each discipline has a different response which may also change over time.
Author Response: Thank you for this comment. We fully agree and, indeed, this question whether effects observed in experiments can be observed in the real world played a core role in our study design process.
We also fully agree that it is key that our observational studies need to be robust. In both observational studies, we took steps that are well beyond the norm, such as [REDACTED] using MrsP to obtain the best estimates possible given our data, and adjusting for spatial autocorrelation in our final models. We would like to note that this issue of spatial autocorrelation has often been overlooked not only in psychology, but also political science and economics. At every step, we have sought to make our analyses as robust as possible.
Further, in our previous revision, we sought to address as many of the issues raised by reviewers as possible. For example, per the reviewer's suggestion, we added an extensive validation study designed to evaluate the likely validity of our MrsP estimates.
Ultimately, it seems that we share many perspectives on the current work, with the one issue of contention being the necessity of state fixed effects. Due to the reasons discussed above, we simply don't believe our data are well suited for ruling out the possibility of state-level confounds.
We would also like to note that we would never seek to publish a study like our second study on its own. Indeed, we generally find cross-sectional studies suggestive at best and it is not the case that we are expecting Study 2 to stand on its own merit. As we note in our manuscript, studying the psychological mechanisms involved in acts of hate is very difficult; to address this difficulty, we have designed and conducted a set of complementary studies, where the goal was to make up for the weaknesses of one design with the strengths of another. In total, six rigorous studies point to the same effect, and that is the package that we are aiming to publish, not Study 2 on its own. I read the authors "defense" carefully and as argued above, if all factors varying with state are orthogonal to their measures, we do not need to include fixed effects. But if we do, the results would not change. In all other cases one has to include them (check e.g. Mundlack's work on using means of groups to block bias from the 1970s). I assume these are differences that arise from different disciplinary backgrounds and this makes it ultimately an editorial decision. I can only repeat myself and say that in leading economics, political science, or sociology journal such an analysis would most likely not stand.
Author Response: We thank the reviewer again, for engaging with this issue. We are well versed with the problem of confounding between-and within-group variance and we agree that estimates that do not decompose or adjust for these sources of variation will be biased. However, again, fixed effects estimators are not a silver bullet. In our previous response and above we explain when and why fixed effects estimators can introduce bias. We have also cited discussions of this issue that were published in a leading political science journal and written by leading econometricians. To anyone who believes that fixed effects estimators offer a final answer or perfect solution to the problem of between-and withingroup confounding, we would strongly suggest reviewing this literature. That said, we would not expect any cross-sectional analysis -whether or not it adjusts for fixed effects -to stand on its own. Across the social sciences we have noticed a recent movement toward stronger identification strategies and, for a study to stand on its own, we believe that cross-sectional designs are no longer satisfactory. However, again, we never intended for our second study to stand on its own. We conducted the most rigorous study that we could, given the limitations imposed by data availability. To address these limitations, we conducted 5 additional studies that address the question of interest from a variety of perspectives. All six, even though complementary, rely on different types of data, and use different populations. Nonetheless, all our studies point to the same mechanism. We feel that it would be very strange to dismiss six studies that demonstrate convergent findings simply because one study suffered the limitation of low within-state variation. Indeed, it would be hard to see that as anything other than "throwing the baby out with the bathwater", as argued by Angrist and Pischke (2008).
Again, we fully agree that Study 2 should not stand on its own, no matter what the discipline. This is exactly why we have our geo-spatial analysis complemented with a study using state of the art natural language processing and three behavioral studies, two of which use stratified representative samples. Frankly, we do not know of any study in leading economics, political science, sociology or psychology journals which provide such diversification of methodologies and data to address complex sociopsychological phenomena.