Extremist ideology as a complex contagion: the spread of far-right radicalization in the United States between 2005-2017

Increasing levels of far-right extremist violence have generated public concern about the spread of radicalization in the United States. Previous research suggests that radicalized individuals are destabilized by various environmental (or endemic) factors, exposed to extremist ideology, and subsequently reinforced by members of their community. As such, the spread of radicalization may proceed through a social contagion process, in which extremist ideologies behave like complex contagions that require multiple exposures for adoption. In this study, I applied an epidemiological method called two-component spatio-temporal intensity modeling to data from 416 far-right extremists exposed in the United States between 2005 and 2017. The results indicate that patterns of far-right radicalization in the United States are consistent with a complex contagion process, in which reinforcement is required for transmission. Both social media usage and group membership enhance the spread of extremist ideology, suggesting that online and physical organizing remain primary recruitment tools of the far-right movement. Additionally, I identified several endemic factors, such as poverty, that increase the probability of radicalization in particular regions. Future research should investigate how specific interventions, such as online counter-narratives to battle propaganda, may be effectively implemented to mitigate the spread of far-right extremism in the United States.


Introduction
The far-right movement, which includes white supremacists, neo-Nazis, and sovereign citizens, is the oldest and most deadly form of domestic extremism in the United States [1,2]. Despite some ideological diversity, members of the farright often advocate for the use of violence to bring about an "idealized future favoring a particular group, whether this group identity is racial, pseudo-national, or characterized by individualistic traits" [3]. Over the last decade, the farright movement was responsible for 73.3% of all extremist murders in the United States. In 2018, this statistic rose to 98% [4]. The increasing severity of far-right extremist violence, as well as the associated rhetoric on social media [5,6], has generated public concern about the spread of radicalization in the United States. Former extremists have referred to it as a public health issue [7,8], an idea advocated for by some policy experts as well [9,10].
Although social media platforms relax geographic constraints on communication, evidence suggests that social media networks still exhibit spatial clustering. For example, the majority of an individuals' Facebook friends live within 100 miles of them [67], the probability of information diffusion on social media decays with increasing distance [68], and online echo chambers map onto particular locations [69]. Since complex contagions require reinforcement, and the majority of online friendship ties are within a close radius, the diffusion of extremist ideologies online should still exhibit some level of geographic bias.
In order to model the spread of far-right radicalization I used a two-component spatio-temporal intensity (twinstim) model [70], an epidemiological method that treats events in space and time as resulting from self-exciting point processes [71]. In this framework, future events depend on the history of past events within a certain geographic range. Event probabilities are determined by a conditional intensity function, which is separated into endemic and epidemic components. This allows researchers to assess the combined effects of both spatio-temporal covariates and epidemic predictors. Epidemic, in this framework, refers to any level of contagion effect and does not necessarily imply uncontrollable spread. With a couple of notable exceptions [72,73], previous applications of self-exciting point process models in terrorism and mass shooting research have not simultaneously modeled diffusion over both time and space [74][75][76][77][78][79][80][81].
The radicalization events in this study, which correspond to where and when a radicalized individual's extremist activity or plot was exposed, came from the Profiles of Individual Radicalization in the United States (PIRUS), an anonymized database compiled by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) [3]. PIRUS is compiled from sources in the public record, and only includes individuals radicalized in the United States who were either arrested, indicted, or killed as a result of ideologically-motivated crimes, or were directly associated with a violent extremist organization. I chose to use PIRUS instead of the Terrorism and Extremist Violence in the United States (TEVUS) database because events in PIRUS are disambiguated by individual and include social variables that may influence the diffusion process.
A contagion effect in this modeling framework could result from one of two forces. The first is a copy-cat effect, in which individuals copy behaviors observed directly or in the media. Although this effect has been proposed in terrorism and mass shooting research in the past [77,82], it seems to be a more plausible contagion mechanism for specific meth-ods of violence [83] (e.g. suicide bombings [84]) rather than radicalization more broadly. The second is linkage triggered by activism and organizing, or ideologically-charged events (e.g. elections, demonstrations, policies), in that region. To differentiate between these two forces, I included two sets of epidemic predictors in the modeling. The first two eventlevel variables, plot success and anticipated fatalities, might be expected to increase epidemic probability if a copy-cat effect is present. This is because successful large-scale events are probably more contagious due to increased media coverage [77]. Alternatively, the two individual-level variables, group membership and social media use, might be expected to increase epidemic probability if activism and organizing drive the linkage between events.

Data Collection
All individual-level data came from PIRUS. Only individuals with far-right ideology who were exposed during or after 2005 (the earliest year with social media data) with location data at the city-level or lower (n = 416; F: 6.0%, M: 94.0%) were included (see Figures 1 and 2). For each individual, the date and location of their exposure (usually when their activity/plot occurred), whether their plot was successful (34.9%), the anticipated fatalities of their plot (0: 69.5%, 1-20: 26.0%, >20: 2.6%, >100: 1.9%), whether they were a member of a formal or informal group of extremists (58.4%), and whether social media played a role in their radicalization (31.2%), were included. Unknown or missing values for each predictor (plot success: 0.5%, anticipated fatalities: 13.5%, group membership: 0%, social media: 54.8%) were coded as 0. To ensure that the coding procedure for missing predictor values did not introduce bias, I checked whether the results of the full model were consistent after multiple imputation with chained equations and random forest machine learning (see Table S1). The location of each exposure was geocoded from the nearest city or town using the R package ggmap [85]. Since domestic terrorists tend to commit acts in their local area [86][87][88], I assumed that exposure locations reflect where individuals were radicalized.
State-level gun ownership was estimated using a proxy measure based on suicide rates and hunting licenses [89]. Using data from 2001, 2002, and 2004 (the only three years for which state-level gun ownership data is available), Seigel et al. found that the following proxy correlates with gun ownership with an R 2 of 0.95: where F S/S is the proportion of suicides that involve firearms (from the Centers for Disease Control and Prevention 1 , or CDC), and HL is hunting licenses per capita (from  County-level demographic data was collected from the US Census using the R package censusapi [90]. This included population density, poverty rate, Gini index of income inequality, percentage of the population that is non-white, percentage of the population that has at least a high-school diploma, and unemployment rate. County-level income, race, education, and unemployment data is only available after 2009, so the 2010 data was used for 2005-2009. Countylevel presidential election voting records were collected from the Massachusetts Institute of Technology Election Lab 5 , and non-election years were assigned the data from the most recent election year.
Geographic data was collected from the US Census using the R package tigris [91].

Model Specification
Twinstim modeling was conducted using the R package surveillance [70]. To convert the data to a continuous spatiotemporal point process, all tied locations and dates were shifted in a random direction up to half of the minimum spatial and temporal distance between events (1.52 km and 0.5 days, respectively) [92].
Step functions were used to model both spatial and temporal interactions. Visual inspection of the pair correlation function for the point pattern indicates that the data is significantly clustered up to 400 km (see Figure S1). As such, the spatial step function was split into four 100 km intervals with 400 km as the maximum interaction radius [93]. The temporal step function was split into four six-month intervals up to two years (based on the the high degree of variation in radicalization and attack planning times among domestic extremists [94][95][96]). I attempted the analysis with different combinations of power-law, Gaussian, and Student spatial functions, and exponential temporal functions, but these variations converged to unrealistically steep spatial and temporal interaction functions that approached zero around two km and two days, and appeared to be significantly influenced by the tie-breaking procedure [92]. Population density (county-level) was log-transformed and used as an offset endemic term. A centered time trend was also included to determine whether the strength of the endemic component has shifted over time. Poverty rate (county-level), Gini index of income inequality (countylevel), gun ownership (state-level), percentage of the population that is non-white (county-level), percentage of the population that has at least a high-school diploma (countylevel), unemployment rate (county-level), percentage of voters that vote Republican in presidential elections (countylevel), violent crime rate per thousand residents (statelevel), and number of hate groups per million residents (state-level) were included as dynamic endemic predictors that change annually. Plot success, anticipated fatalities, group membership, and social media radicalization were included as epidemic predictors.
All possible models with all possible combinations of predictors were run and ranked by Akaike's Information Criterion (AIC) [97]. The best fitting model with the lowest AIC was used assess the effects of each variable on event probability. Rate ratios were calculated by applying exponential transformation to the model estimates.

Permutation Test
To determine whether the spatio-temporal interaction of the epidemic component was statistically significant, I used the Monte Carlo permutation approach developed by Meyer et al. [98]. Using this approach, a twinstim model with all endemic predictors from the best fitting model and no epidemic predictors was compared to 1,000 permuted null models with randomly shuffled event times. For each permutation I estimated the reproduction number (R 0 ), or the expected number of future events that an event triggers on average, which represents "infectivity". A p-value was calculated by comparing the observed R 0 with the null distribution of the subset of permutations that converged.
For additional support, I also ran a likelihood ratio test and a standard Knox test of spatio-temporal clustering. The Knox test was conducted with spatial and temporal radii of

Simulations
To further assess the quality of the model, I conducted simulations from the cumulative intensity function using Ogata's modified thinning algorithm according to Meyer et al. [100]. Using the parameters of the best fitting model, I conducted 1,000 simulations of the last six months of the study period and compared the results to the observed data.

Results
The results of the best fitting model (∆AIC < 2), which included seven endemic and two epidemic predictor variables, are shown in Table 1. Firstly, there is a statistically significant time trend whereby the endemic rate decreases by 4.6% each year, indicating that the strength of the epidemic component has increased over time. There appears to be a baseline increase in the endemic component between 2008-2012 which likely corresponds to the financial crisis [101], as well as a significant spike in the epidemic component around 2016 which likely corresponds to the presidential election [102,103] (Figure 3). There are also significant positive effects of poverty rates (p < 0.01) and the presence of hate groups (p < 0.0001) on radicalization probability. Interestingly, the percentage of voters that vote Republican in presidential elections (p < 0.0001), the percentage of the population that is non-white (p < 0.05), and unemployment rates (p < 0.0001) appear to have significant negative effects on radicalization probability. Gun ownership, education level, and violent crime all had no significant effect on radicalization probability. When Republican voting was replaced with the absolute percent difference between Republican and Democratic voting, a proxy measure for the competitiveness of elections, it was no longer significant. A variance inflation factor test identified no collinearity problems among the time-averaged endemic predictors (VIF < 3) [104].
Both group membership and radicalization via social media have strong and significant positive effects on epidemic probability. Exposures of individuals who belong to formal or informal extremist groups are over four times more likely to be followed by future exposures in close spatial or temporal proximity (p < 0.01). Similarly, exposures of individuals radicalized on social media are almost three times as likely to be followed by future exposures (p < 0.01). Anticipated fatalities and plot success did not appear in the best fitting model. Estimates of the decaying spatial and temporal interaction functions, as well as model diagnostics, can be seen in Figures S2 and S3, respectively. A variance inflation factor test identified no collinearity problems among the epidemic predictors (VIF < 3) [104].
Based on the permutation test, the observed R 0 (0.31) is significantly higher than the null distribution of the converged permutations (N conv = 739, p < 0.01) (Figure 4). This indicates that the spatio-temporal interaction in the epidemic model is significant. Both the likelihood ratio test of the epidemic against the endemic model (p < 0.0001) and   Figure 5), indicating that the model accurately captures the temporal dynamics in the data. Similarly, the model appears to do a good job of capturing the spatial dynamics in the data, although it is clearly weighted towards high population density areas ( Figure 6).

Discussion
By applying novel epidemiological methods to data on 416 extremists exposed between 2005 and 2017, this study provides evidence that patterns of far-right radicalization in the United States are consistent with a contagion process. Firstly, the estimated reproduction number is significantly higher than those from simulated null models, indicating that endemic causes alone are not sufficient to explain the spatio-temporal clustering observed in the data. The reproduction number for radicalization (R 0 = 0.31) is also lower than one, suggesting that extremist ideologies behave like complex contagions that require reinforcement for transmission. Fortunately, this means that extremist ideologies are unlikely to spread uncontrollably through populations like seasonal influenza (R 0 = 1.28) [105], but outbreaks can occur under the right endemic and epidemic conditions. For example, regions with higher rates of poverty and hate group activity are more likely to experience far-right extremism, whereas regions with a larger non-white population, more Republican voting, and higher rates of unemployment are less likely to experience far-right extremism. Most importantly, radicalizations involving extremist groups or social media significantly increase the epidemic probability of future radicalizations in the same location. This suggests that clusters of radicalizations in space and time are driven by activism and organizing rather than a copy-cat effect.
The fact that group membership significantly increases the epidemic strength of events, and the presence of hate groups significantly increases radicalization probability, suggests that local organizing remains a potent recruitment tool of the far-right movement. This idea is reflected in recent increases in rallies across the country, such as "Unite the Right" in Charlottesville, VA in August of 2017, that have been attended by regional chapters of white nationalist and militia organizations. It also suggests that concerns about typological "lone wolves" radicalized over social media should not overshadow the persistent and expanding far-right movement in the United States. Only 10.8% of people in this study were radicalized on social media independently of an extremist group, indicating that solo actors are still the minority in the far-right movement. That being said, solo actors radicalized on social media, such as Omar Mateen (Pulse nightclub shooting in 2016) and Dylann Roof (Charleston church shooting in 2015) [19], are typically deadlier than group members in the United States [106], and should thus be the subject of much future research.
Radicalization on social media also significantly increases the epidemic strength of events, indicating that social media platforms augment physical organizing and that the diffusion of extremist ideologies online is likely geographically bi- ased. The increasing role of social media in far-right extremism and radicalization is well established [5,19,[107][108][109]. Social media platforms like Twitter provide extremist communities with low cost access to large audiences that might not otherwise engage with far-right content [58,59]. For example, one report found that only 44% of people who follow high-profile white nationalists on Twitter overtly express similar ideologies [110]. As mainstream platforms clamp down on hate speech, extremist users have just shifted their traffic to alternative sites such as 8chan and Gab [111,112]. Given the centrality of social media in far-right organizing, future research should explore how counter-narratives [113,114] and other strategies could be used to fight the spread of extremist ideology online.
The results indicate that county-level poverty rates increase the probability of far-right radicalization. While there is little to no evidence that poverty predicts extremism at the state-level [2,36,37,40], studies at the countylevel have found that poverty predicts both mass shooting rate [42] and hate groups (presence [35] not longevity [38,39,41]). This discrepancy between geographic resolutions indicates that using state-level poverty data obscures local variation. The results of this study also reveal a negative effect of unemployment rate on radicalization, adding to the remarkably contradictory evidence for links between unemployment and extremism in the United States [2,33,36,[43][44][45][46]. Although this result appears to be counter-intuitive, I hypothesize that poverty and unemployment may interact in driving radicalization. For example, individuals from regions where jobs are plentiful but poverty remains high may be the most disillusioned and susceptible to extremist ideologies. Interestingly, income inequality did not appear in the best fitting model, and had no significant effect when included. This suggests that overall deprivation, as measured by poverty rates, is more important in driving radicalization than inequality. Previous studies that have found a positive impact of income inequality on hate groups or crimes either used state-level data [46], did not account for poverty rate [32,33], or combined it with poverty rate into a single index [48]. Interestingly, both unemployment [47] and income inequality [42,49,50] appear to be strong predictors of mass shootings. Although this seems paradoxical, the majority of mass shootings are not ideologically driven [115], so the socioeconomic drivers may be different than for far-right radicalization.
Violent crime appears to have no influence on radicalization. Although one study of the Klu Klux Klan found that high levels of far-right activity can increase homicide rates in the long-term [48], there is little evidence for violent crime rates driving increases in extremist violence or radicalization [54]. Hate crime is only very weakly correlated with violent crime [53], and extremist violence is even more rare [116], so they are likely driven by different factors.
Previous studies have found strong evidence for a negative relationship between education and hate crime rates [45,53], a positive relationship between education and mass shooting rates [49,50], and no evidence for a relationship between education and hate group organizing [37,51,52]. The results of this study are consistent with the latter category, which makes sense given that the majority of the plots in the dataset were non-violent.
The negative effect of Republican voting on event probability could be because individuals on the far-right of the political spectrum who live in counties with more Democratic voters may feel more partisan hostility [117]. Interestingly, this effect does not appear to be the result of more competitive elections [38], as the absolute difference between Republican and Democratic voting did not significantly influence event probability. Alternatively, the negative effect of Republican voting may be due to the fact that many of the rural counties that lean heavily Republican have low population densities and no recent history of extremist violence. A previous study that found mixed evidence for a positive influence of Republican voting on the presence of hate groups excluded counties without hate groups from the modeling, which may have eliminated this skew effect [35].
The fact that the percentage of the population that is nonwhite negatively predicts far-right extremist violence is consistent with the intergroup contact hypothesis, which suggests that prolonged contact between racial groups reduces conflict under certain conditions [118]. Although other researchers have suggested that population heterogeneity increases far-right radicalization [32], the only study to find evidence of this in the United States did not explicitly account for population density [34]. Other studies controlling for population density have found that both anti-black hate crimes and hate groups appear to be more common in white dominated, racially homogeneous areas [35,53]. Despite mixed evidence for the intergroup contact hypothesis, it is widely accepted that community diversity and tolerance is key to fighting radicalization and extremist violence globally [119][120][121][122][123][124].
Gun ownership does not predict radicalization in this model, which is unsurprising since only 30.5% of people in this study planned on committing fatal attacks but interesting given the centrality of gun control in debates following mass shootings in the United States [125][126][127]. Despite strong evidence that gun ownership is linked to mass shooting rates at the national-level [56], evidence for same pattern at the state-level remains mixed. Previous studies have found that it either positively predicts mass shootings overall [56], when combined with particular gun control laws [55], or not at all [40,47]. Unfortunately CDC funding for research on gun ownership was restricted by Congress in 1996 after lobbying by the National Rifle Association, so potential links between extremist violence and gun ownership remain understudied [128][129][130][131].
Several limitations of this study should be highlighted. Firstly, the PIRUS database only represents a subset of radicalized individuals in the United States. The creators of the database used random sampling to maximize its representativeness over different time periods, but there remains a possibility of spatial or temporal bias in the original data due to factors like law enforcement effort. In addition, the geographic locations of events are only geocoded to the citylevel, potentially enhancing the spatial clustering of the data. Furthermore, social media data were missing for a significant number of individuals (54.8%). The significance level of the estimate for social media usage is extremely low and robust to imputation, indicating that it likely reflects a real effect, but researchers should exercise caution when interpreting this result [132]. Lastly, the spatial resolution of three of the endemic predictors was limited to the statelevel, which may have flattened some important local variation. One of these variables, gun ownership, was also a proxy measure. Policymakers should release historical restrictions on research funding for gun violence and hate crime research to improve data resolution for future studies.
In conclusion, far-right radicalization in the United States appears to spread through populations like a complex contagion. Both social media usage and group membership enhance the contagion process, indicating that online and physical organizing remain primary recruitment tools of the far-right movement. In addition, far-right radicalization is more likely in Democratic-majority regions with high poverty and low unemployment, fewer non-white people, and more hate group activity. While the federal government has acknowledged the threat of far-right extremism [133], funding for organizations researching or fighting the movement has decreased in recent years [134]. Based on the results of this study, I recommend that policymakers reconsider their funding priorities to address the expanding far-right extremist movement in the United States. Future research should investigate how specific interventions, such as online counter-narratives to battle propaganda, may be effectively implemented to mitigate the spread of extremist ideology.

Data & Code Availability Statement
All data used in the study are available online either publicly or upon request (PIRUS). The R scripts used in the study will be made available upon peer-reviewed publication.

Imputation Check
To ensure that the coding procedure did not bias the estimation of the epidemic predictors, the full twinstim model was re-run after multiple imputation with chained equations using the R package mice [1] and random forest machine learning using the R package missForest [2]. All four epidemic predictors (plot success, anticipated fatalities, and group membership) were used in fitting and training. The maximum iterations was set to 10 and number of trees was set to 100. The results of 100 rounds of both imputation methods can be seen in Table S1.  Table S1: The average rate ratios and p-values for all epidemic predictors after 100 rounds of imputation and estimation using the full model.
Since the observed estimate of social media, the only epidemic predictor with missing data in the best fitting model, is between those from the two imputation methods, and as random forest in missForest outcompetes chained equations in mice in most [2][3][4][5][6][7] (but not all [8,9]) direct comparisons, I assume that the coding method did not significantly influence the results. I assumed that data were missing at random, although missing social 0.2 Spatial interaction Figure S1: The pair correlation function at different pairwise distances in km (x -axis). The black line is the observed function for the data, the red line is the theoretical function assuming spatial randomness, and the grey envelope shows the upper and lower bounds of the functions from 100 simulated point patterns demonstrating spatial randomness.

Diagnostics
The residuals, or the fitted cumulative intensities over time, were calculated and transformed to fit a uniform distribution according to Ogata [10]. The cumulative density function diverges from expectations for U i < 0.58, which appears to be the result of tie-breaking with small temporal distances (0.5 days) [11]. Increasing the tie-breaking distance to > 20 days to improve the cumulative density function and reduce serial correlation did not significantly change the predictor estimates, so I chose to use the original model. Figure S2: Estimates of the scaled spatial (left panel) and temporal (right) step functions. The 95% Monte Carlo confidence intervals were each calculated from 100 samples. Figure S3: (A) The empirical cumulative density function of Ui, or the standardized residuals according to Ogata [10], with 95% Kolmogorov-Smirnov confidence bands. (B) A scatterplot of Ui and Ui+1 to look for serial correlation.