Learning future terrorist targets through temporal meta-graphs

In the last 20 years, terrorism has led to hundreds of thousands of deaths and massive economic, political, and humanitarian crises in several regions of the world. Using real-world data on attacks occurred in Afghanistan and Iraq from 2001 to 2018, we propose the use of temporal meta-graphs and deep learning to forecast future terrorist targets. Focusing on three event dimensions, i.e., employed weapons, deployed tactics and chosen targets, meta-graphs map the connections among temporally close attacks, capturing their operational similarities and dependencies. From these temporal meta-graphs, we derive 2-day-based time series that measure the centrality of each feature within each dimension over time. Formulating the problem in the context of the strategic behavior of terrorist actors, these multivariate temporal sequences are then utilized to learn what target types are at the highest risk of being chosen. The paper makes two contributions. First, it demonstrates that engineering the feature space via temporal meta-graphs produces richer knowledge than shallow time-series that only rely on frequency of feature occurrences. Second, the performed experiments reveal that bi-directional LSTM networks achieve superior forecasting performance compared to other algorithms, calling for future research aiming at fully discovering the potential of artificial intelligence to counter terrorist violence.

variable is added to the dataset mapping those events for which doubt exist regarding their terrorist nature.
In order to avoid biases and noise in our signals, during the generation of the proposed meta-graphs and the time series, we proceeded to exclude all those events that were doubtful in terrorist nature, according to the doubtterr variable. This led to a slight reduction in the total number of attacks: 12,120 attacks for Afghanistan and 22,773 for Iraq.
The next two subsections will provide an overview of the descriptive statistics of the temporal meta-graph derived multivariate time series for both Afghanistan and Iraq.

S1.2 Afghanistan
The Afghanistan multivariate time-series, derived from our framework, focus on two-day time units, for a total of 3,289 data points (the same for Iraq). In the period under consideration, 939 units did not record attacks, a 28.54% of the total. The histogram below ( Figure S1) reports the count of non-zero features in the considered time series, for each time unit (i,e., how many weapons, tactics and targets have a centrality value higher than 0 at time unit u?).

S1.2.3 Targets
In the 2000-2018 period, a total of 21 target categories were hit at least once by terror-  Table 3 shows the distribution of these targets.
Given the low prevalence, as done for all the other dimensions (in both datasets) we have proceeded to exclude those features that were present less than 10 times of the course of the entire 2000-2018 period. Tourists and Maritime are thus excluded from the experiments, leading to a total of 18 targets ( Figure S4). ing . The distribution is reported below (Table 4).

S1.3 Iraq
Hijacking has been excluded from the multivariate time-series given the extremely low prevalence over the entire history U . The distribution of the values of features in X for Iraq are displayed below ( Figure S6).    https://matplotlib.org/3.1.3/contents.html.
As done for the Afghanistan dataset, also in this case Other and Sabotage Equipment have been removed from the set of multivariate time series. The final set of W for Iraq is displayed in Figure S7.

S2.1 Algorithms: Architectures' Details
This subsection provides details on the architectures of the different models, except the baseline which did not involve any learning mechanism. Overall, all models have been trained for 100 epochs using Adam as the optimizer given its ability in noisy problems involving sparse gradients (Kingma and Ba, 2017) and a batch size equal to 16. Furthermore, we set 10 as the patience hyper-parameter mapping the validation loss of the model. As described in the main manuscript, all models have been run using different input width in terms of time units u, to understand what the optimal length of the recent  history to take into account in order to obtain better forecasts is: 1 these input widths were 1 (=2 days), 5 (=10 days), 15 (=1 month), 30 (=2 months).
It will be noted that the architectures are not particularly complex, i.e., they do not involve multiple hidden layers in most cases: this is mostly due to the limited amount of data (in terms of u) at our disposal. We also performed experiments with more complex architectures made of a higher numbers of stacked layers (and higher number of units and filters), but the complexity of the networks did not lead to increments in algorithmic performance (while leading instead in higher computational costs). Nonetheless, the resulting outcomes of the models presented in the paper indicate that even simple learning  architectures are capable of efficiently forecasting terrorist targets. This aspect inserts in an emerging area of research that investigates the benefits of training simpler models over massive networks (Ba and Caruana, 2014). Instead of being a limitation, reaching good performance with simple and computationally cheap models may be considered a strength of the proposed computational framework and, particularly, of the engineering of our feature space. As a final note, in addition to the specific use of Dropout as a regularized for some models, all architectures have been trained using early stopping with a patience of 10 epochs in relation to Mean Squared Error to further limit the risk of overfitting.

S2.1.1 Feedforward Neural Network (FNN)
The FNNs trained in our experiments involved an input layer, followed by a flatten layer with 0 trainable parameters. Following, two dense layers with 32 neurons each with Rectified Linear Unit (ReLU) as the activation function. Finally, the last layer involved a number of units equal to the number of targets (i.e., 18 in the Afghanistan dataset, 20 in the Iraq one), and was followed by a reshape layer with no parameters. All dense layers had Glorot uniform as the kernel initializer and included a bias vector. The dense and LSTM remaining layers share the same hyperparameters of the previously outline dense and LSTM layers found in the other models.  https://matplotlib.org/3.1.3/contents.html.