Introduction

Explosives detection1,2,3,4 has been an open-end problem since World War I (WWI)5. Due to the recent technological advancements and intelligence of organized terrorist groups around the globe, they were capable of hacking lots of traditional explosives detection techniques. That being said, researchers have been working on alternative explosives detection systems that can outperform conventional methods relying on sniffing dogs and X-ray machines6. Hence, many researchers have been paying attention to alternative techniques in explosives detection. Most of the research was focusing on chemically detecting explosives such as using chemically modified multiplexed mode with nanoelectrical devices arrays as a method for super sensitive explosives identification and discrimination7. Others were capable of identifying dinitrotoluene at room temperature using a reduced graphene-based oxide gas sensor when modified with a peptide receptor8. Research effort demonstrated the capabilities of microporous polymer networks as easily and low-cost manufactural devices for explosives detection9. Appreciating the huge analytical power of machine learning in clustering, regressing, and classification of data, some research has been conducted in using various machine learning techniques in analyzing sensors’ data for explosives detection in different environments. Some researchers worked on visualizing explosives by three-dimensional voxel radar using convolutional neural networks10. In addition, Deep learning was implemented in detecting explosives using handheld ground penetrating radar (GPR)11. Multilayer perceptron models (MLPs) from the artificial neural networks (ANNs) family of artificial intelligence was coupled with pulsed fast thermal neutron activation (PFTNA) technique for detecting explosives12. It also showed an accuracy of 97% in forecasting the presence of explosives and drugs when coupled with images obtained from thermal neutrons tomography13.

Nuclear based techniques for explosives detection were introduced in 19864. In the past 20 years, lots of researchers focused on using those techniques in explosives detection for aviation security purposes14. Prompt gamma neutron activation analysis (PGNAA) was studied extensively due to its vast potential and applicability. PGNAA is a quantitative isotopic identification technique. A PGNAA system broadly consists of a neutrons source, unknown sample (target to be investigated), and a detector array15. When the target is bombarded with the neutron beam, neutrons interact with the target nuclei, emitting the gamma spectrum that includes peaks at certain energies3. These energies represent the fingerprints of the target isotopic composition. Analyzing the heights of the peaks emitted at each energy yields the quantitative information about the sample material composition16. One of the main advantages of PGNAA in explosives detection is that the irradiation and detection process occur simultaneously17. Hence, PGNAA showed extremely high efficiency in identifying explosives, along with reducing the time needed for luggage investigation in airports and on borders. This will reduce the delay time in the passengers' queue18. PGNAA has one major drawback, which is shielding the target to be investigated19,20. Once the target is shielded, whether the shield is for neutrons or gamma rays, the shield distorts the gamma spectrum read by the detectors, as illustrated in Fig. 1. Hence, the system will not be able to recognize the peak heights correctly, resulting in the false prediction of the target isotopic composition. One other drawback of using PGNAA in explosives detection is the need for a skilled operator to build a decision based on the system’s results21. Using machine learning regressors and classifiers such as K-nearest neighbor (KNN) regressors and decision tree classifier to analyze the gamma spectra resulted in 92% accuracy in differentiating between explosive and non-explosive hydrocarbons22.

Figure 1
figure 1

Sample of PGNAA obtained gamma spectra for TNT when shielded with boron.

The shielding issue was discussed in multiple studies. Some researchers focused on the shield thickness, and others focused on studying the neutron shield effect on the explosives detection capabilities23,24,25,26. In this article, we show how coupling deep neural networks with the PGNAA technique can significantly help to solve the shielding issue. This coupling will result in an end-to-end automated framework that will reduce the need for a skilled operator to analyze the gamma spectra read by the detectors array.

In this work, the proposed end-to-end framework consisted of four regressors feeding one classifier. The initial input was gamma energy peaks heights, and the output was whether the combination of those peaks represents a hydrocarbon explosive or not, as illustrated in Fig. 2. The methodology of developing this framework development consisted of three main steps; (1) data generation, (2) regressors development, and (3) classifier development. The framework consisted of a pipeline which is a sequence of data manipulation steps starting with raw data and ending with predicted values with minimal error. These steps include data cleaning, feature selection, feature reduction, building the model, testing the model, tuning the model and predicting the final outcome.

Figure 2
figure 2

Proposed end-to-end framework.

Data generation

Due to the sensitivity of the research topic, we used synthesized data instead of experimental data. We used the PGNAA technique in acquiring information about the unknown hydrocarbon sample. Since the proposed framework is developed for differentiating between explosive and non-explosive hydrocarbons regardless of the shield presence. Hence, we focused on the gamma energy peaks representing hydrogen (H), carbon (C), nitrogen (N), and oxygen (O), as listed in Supplementary Table 127,28. For the data generation, we used a Monte Carlo based computational tool for radiation transport calculations (MCNP Code29) to mimic the neutron interactions with the samples and gamma spectra read by the detector array30. The simulated setup is replicating the Romasha experimental setup located in Frank Laboratory at the Joint Institute for Nuclear Research (JINR) as demonstrated in Fig. 331. The Romasha setup consisted of an ING-27 D-T neutron generator that generates 14.1 meV neutrons, six iron sheets collimator, and Ten BGO detectors located in a semicircle of 30 cm radius, as demonstrated in Fig. 3. Dimensions of the setup are listed in Supplementary Table 222,31. The gamma rays emitted due to neutron interactions with the sample travel in different directions. Hence, using detectors array in PGNAA setups is a standard procedure for detecting the emitted gamma spectrum. We modeled the neutrons emitted from the D-T neutron generator as point isotropic source of 14.1 meV energy. Hence, changing the orientation of the investigated sample will not affect the resulting gamma spectrum. We used our previously developed and validated MCNP model with validation metrics listed in Supplementary Table 322. In the developed MCNP model that we used in the data generation process, we didn’t consider the natural radioactivity background. Natural radioactivity on earth usually includes gamma radiation. The neutron background radiation is insignificant. Hence, it is usually neglected in the application design process. Regarding the gamma background radiation levels, it is a standard procedure for any setup that includes radiation detection, and the detectors are calibrated to remove the background gamma readings from the relevant energy channels. In our case, the relevant channels are listed in Supplementary Table 1. Using the validated MCNP model, we generated 1,478 samples for non-shielded and shielded explosive and non-explosive hydrocarbons with a variation of shield thicknesses from zero (not-shielded) to three cm shield. Twenty-two different hydrocarbon explosives were included in the data generation process, their chemical composition (in mass fractions), densities (\(\rho \) in g/cm3), volume (Vol in cm3), and masses (M in g) are listed in Table 1. The shields studied were boron (B), light water (H2O), borated light water (BW), polyethylene (Poly), borated polyethylene (BP), lead (Pb), iron (Fe), and steel, the generated data breakdown is listed in Supplementary Table 4. We chose random weight fractions of H, C, N, and O for non-explosive hydrocarbon samples to represent the randomness in hydrocarbon materials compositions that exist in the ordinary luggage to be investigated. To cross-validate the whole pipeline using the leave-one-out method, we divided the generated 1,478 samples into 11 datasets. We separated the final test data set by excluding two explosive hydrocarbons with all of their shield variations (50 samples), and 50 random non-explosive hydrocarbons with random shield variations from each data set. Hence, each development data set consisted of 1,378 samples. Average standard deviation scores associated with the generated data are listed in Supplementary Table 5. In this proof of concept, the obtained gamma spectrum is per source neutron and per second. Also, the gamma spectra input features were normalized between zero and one as a pre-processing measure necessary for the model development stage. Hence, the provided results are for irradiation per second. That being said, we believe that increasing the irradiation time and D-T neutron generator intensity of 109 to 1012 n/cm2 s during the practical deployment will provide better results. Using a prescreening device such as the X-ray machines will reduce the total screening time by permitting the movement of suspicious baggage to another convoy, which leads to the D-T neutron generator detection system23. In practice assuming 10% of the baggage are suspicious and using a high-intensity neutron generator. This will reduce the time of the multibarrier screening process and ensure an efficient detection of both shielded and unshielded illicit materials. The presence of the X-ray prescreening device will also help to direct the neutron generator specific to locations in the parcel and reduce the time for the second screening process23.

Figure 3
figure 3

(a) Real image of the ROMASHA setup located in Frank laboratory in JINR, (b) ROMASHA setup MCNP model developed for data generation.

Table 1 Studied explosives chemical compositions.

Regressors development

A regressor is a method that generates a model capable of predicting the numerical dependent variable and minimize the error between the predicted value and the actual value for the whole range of the dependent variables and the whole space of the independent variables. We developed four deep neural network regressors to predict the weight fractions of H, C, N, and O in the investigated samples, respectively. The input for each regressor was the 11 gamma energy peak heights read by the ten detectors in the detector array. Hence, the total number of input features was 110 features. Outputs of the four regressors were the H, C, N, and O weight fractions, respectively. Training and test size represented 80% and 20% of each development data set from the data generated, respectively. Hence, we cross-validated each of the developed models across five folds. In each fold, training and test samples were chosen randomly. We used the mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (R2) as the quality metrics for the developed regressors, as listed in Table 2. The R2 score is a regression metric that evaluates the quality of fit, and it measures the percentage of the correctly predicted numerical values in comparison to the whole dataset. Although the oxygen weight fraction regressor had the highest test MAE, we considered the regressors responsible for predicting the H, and C elements weight fractions showed the worst and less bad quality metrics as they had the highest differences between training and test MAEs. This indicates a higher tendency to overfit the regression process. The reason behind this is the existence of light water, and polyethylene as shields within the data generated samples. The issue associated with these two particular shields is the existence of H and C within the shield material composition. Putting aside the fact that H and C are considered excellent neutron shields. Neutron interactions with H and C in those shields will also, add to gamma energy peaks’ heights of H, and C read by the detectors array. This will not only distort the resulting gamma spectrum due to neutron shielding, but this will also provide misleading H, and C energy peaks’ heights that do not represent H and C weight fractions in the investigated sample. Although O exists in light water, it is a neutron transparent element, and thus, it has a low probability of interaction with neutrons. That being said, the regressor predicting the O weight fractions showed regression quality metrics better than that of H and C regressors. On the other hand, the regressor responsible for the prediction of N weight fraction showed the least MSE, MAE, and the highest R2 scores. This was due to the absence of N in any of the investigated shields.

Table 2 Average metrics for the four developed regressors.

Classifier development

A classifier is a method that builds a model capable of identifying different categorical data items according to the set of features associated with them. In the last stage of our pipeline, we developed a classifier to differentiate between explosive and non-explosive hydrocarbons regardless of whether the investigated hydrocarbon was shielded or not. The input for the classifier was the output of the four regressors. The classifier’s output was whether the regressed weight fractions of H, C, N, and O represent an explosive or non-explosive hydrocarbon. Similarly to the regressors’ development, training and test data sizes represented 80% and 20% from each development data set from the data generated, respectively. The classifier was also cross-validated over five folds. We used accuracy, precision, recall, and F1 scores as the developed classifier’s quality metrics. Accuracy measures how many of the predicted classes for the categorical values were correctly classified in comparison to the whole data set. The accuracy metric can be misleading if the data was unbalanced. Also, it is not statistically significant. Precision represents the ratio between the predicted true positives and all the positively predicted instances (true positives and false positives). While the recall score is the ratio between the predicted true positive instances and the true number of positives that should have been scored (true positives and false negatives). Finally, F1-score is the harmonic mean for precision and recall. Considering that precision and recall are negatively proportional for most of the models, the high harmonic mean implies a robust model that can predict the true positive, true negative, false positive and false negatives properly. We trained the classifier through feeding the regressed weight fraction values of H, C, N, and O rather than the original values to reduce the error propagation possibility. The developed classifiers showed 95% for all weighted mean quality metrics. Details of each developed classifier quality metrics are listed in Table 3.

Table 3 Classification metrics for the 11 developed classifiers.

Pipeline performance

In summary, we developed 11 pipelines, each pipeline consisted of four regressors to predict the weight fraction values of H, C, N, and O respectively and a classifier to determine whether the investigated sample was an explosive hydrocarbon or not. During the development of each pipeline, one of the 11 development data sets was chosen. Finally, we tested each pipeline twice, once on the development data set, and the other through the corresponding final test data set (data that has not been included in the model development data set neither in training or testing). As expected, testing the pipelines on the development data sets resulted in the same weighted mean accuracy, precision, recall, and F1 scores as that of the classifiers' development scores (95%). On the other hand, when we tested the pipelines on the final test data sets, classification quality metrics’ weighted mean scores dropped to 80%, 79%, 85%, and 80% for the accuracy, precision, recall, and F1 scores respectively. These scores represent the pipeline capability of generalization over unknown explosives and non-explosives regardless of the shield existence or not. Detailed classification metrics scores for the development data set test, and the final test data set test are listed in Tables 4 and 5, respectively. We noticed from the test performed on the final test data sets that the average false alarm rate is 2%.

Table 4 Development data set test classification metrics for the 11 developed pipelines.
Table 5 Final test data set test classification metrics for the 11 developed pipelines.

Conclusions

From the above discussion, we concluded that the developed end-to-end framework scored higher classification metrics for previously included explosives in the training process. Due to the nature of security problems, and since there is a finite number of explosives, it is possible to include all the known explosives in the regressors and classifiers training. However, some of the developed pipelines were capable of detecting 920 g of trinitro-azetidine (TNAZ) with accuracy of 84%, 800 g of nitroglycerin (NG) and 825 g of trinitrotoluene (TNT) with accuracy of 88%, 880 g of Picric acid, 865 g of trinitro-phenylmethyl nitramine (tetryl), and 795 g of urea nitrate (UN) with accuracy of 92%, and finally 885 g of pentaerythritol tetranitrate (PETN) with accuracy of 100%. The aforementioned explosives were not included in neither the training nor test of their corresponding development data sets. However, by testing the minimum detectable mass for the PETN across three cm of the studied shields. The pipeline was capable of detecting minimum mass of 708 g of PETN for the shields water, borated water, iron, lead, and boron. Also, it was capable of detecting 177, 354, and 531 g of PETN when shielded with three cm of borated polyethylene, polyethylene, and steel respectively. Thus, coupling deep neural networks with the PGNAA technique showed huge potential in overcoming the neutron and gamma shielding drawback of the PGNAA technique in explosives detection and security applications. We believe that including more massive data sets that include experimental data that includes more parameters can significantly improve the efficiency of the proposed pipeline in explosives detection. Future work may include studying actual luggage with various hydrocarbon compounds placed around the shielded sample (whether it was explosive or not). We used the polyethylene and borated polyethylene shields as they are considered neutron absorbers that can be used in shielding the investigated sample. They also provide insights about the ability to proceed with this work to investigate samples surrounded by items usually placed in ordinary luggage.