Pilot study of eruption forecasting with muography using convolutional neural network

Muography is a novel method of visualizing the internal structures of active volcanoes by using high-energy near-horizontally arriving cosmic muons. The purpose of this study is to show the feasibility of muography to forecast the eruption event with the aid of the convolutional neural network (CNN). In this study, seven daily consecutive muographic images were fed into the CNN to compute the probability of eruptions on the eighth day, and our CNN model was trained by hyperparameter tuning with the Bayesian optimization algorithm. By using the data acquired in Sakurajima volcano, Japan, as an example, the forecasting performance achieved a value of 0.726 for the area under the receiver operating characteristic curve, showing the reasonable correlation between the muographic images and eruption events. Our result suggests that muography has the potential for eruption forecasting of volcanoes.


Results
Muography observation system at Sakurajima volcano. The muography observation system (MOS) 4 was installed at Sakurajima Muography Observatory (SMO) 22 . Figure 1(A-D) respectively show the location of the measurement site, a topographic map of the measurement site, and a cross-sectional view of Sakurajima volcano. A more detailed description of this MOS can be found elsewhere 4 ; thus, the MOS will be briefly introduced here. The system consists of five 10-cm-thick lead plates and six layers of scintillation position-sensitive planes. Each position-sensitive plane consists of N x = 15 and N y = 15 adjacent scintillator strips, which together form a segmented plane with 15 × 15 segments; thus, the total active area for collecting muons is 2.25 m 2 . The observation system (red star in Fig. 1(C,D)) was installed in the southwest direction at distances of 2.8 km, 2.7 km, and 2.6 km from the Showa crater, the Minamidake A crater, and the B crater, respectively 6 . The three craters were located within the field of view of the MOS. The elevation of the measurement site was ~150 m above sea level (ASL). The angular resolution of the muography observation system was 33 milliradians (mrad). eruption forecasting using convolutional neural network. We investigated the effectiveness of a convolutional neural network (CNN) for eruption forecasting at the Showa crater of Sakurajima volcano based on muograms. The CNN, which is one of the prominent deep learning models, has shown success in image classification and object detection 23 . Figure 2(A) shows the relationship between the input data for the CNN model and the prediction term for an eruption. The muograms used in this study were plots of the daily muon count (observation period: 00:00:00 to 23:59:59). We inputted muograms obtained for seven consecutive days into the CNN to compute the probability of eruptions for the eighth day, called the "prediction day. " The number of muograms was determined experimentally. If at least one eruption occurred at Showa crater during the prediction day, the day was labeled as an eruption. The eruption times were based on the data from the website of Kagoshima Meteorological Office, Japan (https://www.jma-net.go.jp/kagoshima/vol/kazan_top.html, in Japanese). The total number of eruptions during the prediction day was not referred to. From the above, the CNN model predicts whether an eruption will occur at Showa crater on the eighth day using muograms obtained for seven consecutive days.
We used 464 sets of seven consecutive daily muograms and a label of the prediction day, which were obtained between 7 November 2014 and 12 May 2016. We excluded seven consecutive daily muograms that include unobserved periods due to maintenance of the MOS. We split the dataset into three subsets: a training set, a validation set, and a test set. The training set was used to train the model and included 382 sets of muographic data (prediction days, 20 November 2014-28 January 2016; number of eruption days, 191). The validation set was used to calculate the evaluation criterion of the hyperparameter tuning and includes 40 sets of muographic data (prediction days, 8 February 2016-18 March 2016; number of eruption days, 20). The test set was used to evaluate the best model throughout the hyperparameter tuning, which includes 42 sets of muographic data (prediction days,  Figure 2(B) shows the configuration of our network. Our network consisted of one to four convolutional layers, one fully connected layer, and one output layer with two units with softmax activation. We employed a rectified linear unit (ReLU) function 24 as the activation function for all layers except the output layer. Batch normalization 25 was performed before each ReLU function. The dropout strategy 26 was adopted for all layers except the output layer to avoid overfitting. We utilized the Adam method 27 to optimize the network weights. The training procedure of the CNN model is described in Methods.
We compared the following four input regions: • Showa crater region: 165 mrad < θ <297 mrad, −66 mrad < φ <66 mrad (5 × 5 segments) • Minamidake crater region: 198 mrad < θ <330 mrad, −330 mrad < φ <99 mrad (5 × 8 segments) • surface region: 33 mrad < θ <198 mrad, 132 mrad < φ <264 mrad (5 × 5 segments) • all segments: 33 mrad < θ <330 mrad, −462 mrad < φ <462 mrad (10 × 29 segments) where θ and φ indicate the elevation angle and azimuth angle, respectively. The Minamidake crater region included both the Minamidake craters A and B. The surface region included 5 × 5 segments of the mountain surface excluding the three craters as a baseline. The values of the segments outside the volcano were set to 0. For standardization, each segment value was multiplied by 0.001. By assuming the average rock thickness of 1 km in these regions, ~50 and ~90 muon events/day were respectively expected in the Showa and Minamidake crater regions. Figure 3 shows an example of three consecutive daily muogram images used for the training set. Figure 4 shows the relative muon counts averaged over 15 events for the training set that were selected to satisfy our condition: there is no eruption at least two days before the eruption, where the "relative muon count" is obtained by dividing by the average of daily muon count acquired during the period of no eruption (from 1 November 2015 to 30 November 2015). The four plots in this figure correspond to the data acquired from the aforementioned four regions ((A) Showa crater region, (B) Minamidake crater region, (C) surface region, and (D) all segments). From Figs. 3,4(A), the muon count of the Showa crater region tended to decrease on the day before the eruption.
For comparison, we also evaluated three types of classifiers: a simple rule-based method, a support vector machine (SVM) model using the radial basis function (RBF) kernel 28 , and a neural network (NN) model. The simple rule-based method classifies an eruption occurring on at least four out of seven consecutive days as an eruption. This threshold was selected to maximize the accuracy of the training data. The SVM model and the NN model were applied only to the Showa crater region. For the feature values of the two models, we compared 175 segment values (25 segment values × 7 days) and seven summed values (the summation of 25 segment values for each day). We selected the seven summed values as the feature values with the best performance. The training procedures of the SVM model and the NN model are described in Methods.
We evaluated our method using receiver operating characteristic (ROC) analysis 29,30 , and the area under the curve (AUC) was calculated, which ranges between 0.0 and 1.0, with values of 0.5 for random classification and 1.0 for perfect classification. If the AUC is less than 0.5, the classification results are meaningless. Figure 5 shows the ROC curves for the test set for each input data. As shown in Fig. 5, the AUC values were 0.726 for the Showa crater region, 0.678 for the Minamidake crater region, 0.444 for the surface region, and 0.544 for all segments. Figure 6 shows the ROC curves for the test set for the SVM and NN models of the Showa crater. As shown in Fig. 6, the AUC values were 0.569 for the SVM model and 0.499 for the NN model.  Table 1 shows prediction performance at the optimal cutoff point for each input data. The optimal cutoff point of the ROC curve was chosen using Youden's index 31 . The CNN model of the Showa crater showed the highest accuracy and specificity. In contrast, the SVM model of the Showa crater showed the highest sensitivity.

Discussion
We have shown that our method may achieve moderate performance for day-level eruption forecasting at the Showa crater of Sakurajima volcano. The AUC, accuracy, and specificity were highest when the input to the CNN model was limited to the segments of the Showa crater region. The sensitivity was highest when the input to the CNN model was the all segments. In addition, the muon count of the Showa crater region tends to decrease before the eruption, which was considered to be due to the plugging of a magma pathway by magma deposits on the crater floor. These results suggest that the muographic data of the segments around Showa crater contributed to the forecasting performance. By contrast, the AUC was less than 0.5 when the input to the CNN model was from the segments of the surface region. Consequently, muograms have potential use for eruption forecasting of a volcano by analyzing temporal changes in its internal structures.
The CNN model of the Showa crater region was superior to the SVM model and the NN model in the AUC, accuracy, and specificity. In the CNN model, the convolutional layer works as the feature extractor, extracting local features of each layer. In addition, a deeper convolutional layer can detect more complex features. These roles of the CNN model make it more promising than other machine learning models.
The Showa crater region was limited to 5 × 5 segments since the angular resolution of the muography observation system was 33 mrad per segment. To overcome this problem of low resolution, a high-definition muography observation system was developed, with which muograms have acquired since January 2017 6 . The angular resolution of the high-definition muography observation system is 2.7 mrad per segment, and it can be expected to image more detailed internal structures of the volcano. We plan to investigate the use of high-resolution muograms when a sufficient number of muograms are collected. We also plan to investigate the eruption forecasting of Minamidake craters, because most of the eruptions have occurred from these craters since November 2017.
The consecutive muographic images are time series data. Recurrent neural networks (RNNs), especially long short-term memory (LSTM) 32 , are an effective model for processing time series data 33 . A combination of CNNs and RNNs is expected to improve the forecasting performance. We plan to investigate their combination in our future work.
At Sakurajima volcano, other observation data, such as volcanic earthquake, volcanic tremor, tilt, and GPS data, are reported on the website of Japan Meteorological Agency (http://www.data.jma.go.jp/svd/vois/data/ www.nature.com/scientificreports www.nature.com/scientificreports/ tokyo/STOCK/bulletin/index_vcatalog.html, in Japanese). However, these data are only available until April 2016 at the time of writing. The forecasting performance may be further improved by adding these observations to the input of the CNN.
There are two limitations to this study. First, in the labeling of the eruption day, we have excluded ten eruptions that occurred at the Minamidake craters during the period of analysis. This is because during the training data period, only two eruptions occurred at the Minamidake crater, both of which were erupted at the Showa crater on the same day. Second, an estimation of the uncertainty or confidence in the output decisions of our model was not carried out. For practical eruption forecasting using our method, it is crucial to estimate the uncertainty or confidence of the model 34 .
Methods training of cnn. In the training of the CNN, the optimization of numerous hyperparameters has a strong effect on the performance of the CNN model. Strategies for hyperparameter optimization include a grid search, random search 35 , and the Bayesian optimization (BO) algorithm 36 . BO is a framework for the optimization of black-box functions whose derivatives and convexity properties are unknown. BO is expected to optimize hyperparameters more efficiently than a random search.   www.nature.com/scientificreports www.nature.com/scientificreports/ unit (GPU) with 12 GB memory. Table 2 shows the selected sets of hyperparameters of the CNN model for each input data.
training of SVM. In the training of the SVM model using the RBF kernel, we also carried out 200 trials of hyperparameter tuning with the BO algorithm. The tuned hyperparameters of the SVM model using the RBF kernel were the regularization parameter (C = 2 −5 -2 15 ) and the kernel parameter (γ = 2 −15 -2 3 ) 28 . We utilized the AUC of the ROC curve as an evaluation criterion for hyperparameter tuning.
The SVM model was implemented using scikit-learn version 0.21.3. The selected hyperparameters of the SVM model were C = 0.0659 and γ = 7.237. training of nn. In the training of the NN model, we also carried out 200 trials of hyperparameter tuning with the BO algorithm. The tuned hyperparameters of the NN model were the number of hidden layers (1-3), the number of units of each hidden layer (2 h , h = 2-8), the batch size, two parameters of the Adam method (α = 10 −3 -10 −6 , β 1 = 0.9-0.99), and the ratio of dropout (0-0.5). We utilized the AUC of the ROC curve as an evaluation criterion for hyperparameter tuning. The maximum number of epochs and the patience of early stopping were set to 100 and 10, respectively.
The  Table 2. Selected sets of hyperparameters of the CNN model for each input data.