Introduction

Muography is a newly developed imaging technique utilizing high-energy near-horizontally arriving cosmic muons and enables us to visualize the internal structures of large objects. Muography produces a projection image (hereafter, muogram) of a large body by mapping out the number of muons that are transmitted through it. Muograms were first used in 1970 by Alvarez et al. to search for hidden chambers in the Chephren’s Second Pyramid1. Almost forty years later, muograms were first used to explore the internal structures of volcanoes2. Muograms depicting the internal structures of a volcano are of particular importance because they may be used in the study of eruption dynamics3. The first experimental evidence discovered using muograms was obtained by studying the summit of Mount Asama, Japan2. The observation, in conjunction with the observation of a low-density magma pathway imaged underneath a solidified magma deposit, confirmed that muograms could resolve a volcano structure with more precision than the preexisting geophysical techniques. Since this first experimental study on the internal structure of volcanos in 2007, similar experiments have been carried not only in Japan3,4,5,6 but also in the US7 and Europe8,9,10,11.

Eruption forecasting is one of the most critical tasks in modern volcanology12,13. For these tasks, many methods based on statistical algorithms or machine learning have been reported14,15,16,17,18,19. These methods use data such as seismic activity, ground deformation, and gas emission. However, to the best of our knowledge, there has been no literature on eruption forecasting using muograms. Muography is conceptually similar to standard X-ray radiography. In the field of medical image analysis, including the analysis of X-ray radiographs, deep learning has been shown to achieve remarkable results20,21. We expect that volcanic eruptions can also be forecasted by applying deep learning to muograms.

The purpose of this study is to show the feasibility of eruption forecasting using muograms with deep learning. We focused on muographic data acquired at Sakurajima volcano, Japan, between 2014 and 2016 when it was most activated in the last eruption episode (2009–2017).

Results

Muography observation system at Sakurajima volcano

The muography observation system (MOS)4 was installed at Sakurajima Muography Observatory (SMO)22. Figure 1(A–D) respectively show the location of the measurement site, a topographic map of the measurement site, and a cross-sectional view of Sakurajima volcano. A more detailed description of this MOS can be found elsewhere4; thus, the MOS will be briefly introduced here. The system consists of five 10-cm-thick lead plates and six layers of scintillation position-sensitive planes. Each position-sensitive plane consists of Nx = 15 and Ny = 15 adjacent scintillator strips, which together form a segmented plane with 15 × 15 segments; thus, the total active area for collecting muons is 2.25 m2. The observation system (red star in Fig. 1(C,D)) was installed in the southwest direction at distances of 2.8 km, 2.7 km, and 2.6 km from the Showa crater, the Minamidake A crater, and the B crater, respectively6. The three craters were located within the field of view of the MOS. The elevation of the measurement site was ~150 m above sea level (ASL). The angular resolution of the muography observation system was 33 milliradians (mrad).

Figure 1
figure 1

Schematic drawing of our muography experiment. (A,B) Location of Sakurajima volcano in Kyushu, Japan (https://maps-for-free.com/). (C) Elevation map and the schematic of measurement site. The muography observation system (MOS, red star) was installed in the Southwest direction at distances of 2.8 km, 2.7 km, and 2.6 km from the Showa crater, the Minamidake A crater, and the B crater, respectively. The elevation map was created from data of Geospatial Information Authority of Japan (http://www.gsi.go.jp/) and edited by the authors. (D) Cross-sectional view of Sakurajima volcano along line c-d in (C). (Source: Oláh, L. et al., Sci. Rep. 8, 3207, pp. 26, cc BY 4.0. We have changed “mMOS” to “MOS” and “Crater Showa” to “Showa”).

Eruption forecasting using convolutional neural network

We investigated the effectiveness of a convolutional neural network (CNN) for eruption forecasting at the Showa crater of Sakurajima volcano based on muograms. The CNN, which is one of the prominent deep learning models, has shown success in image classification and object detection23.

Figure 2(A) shows the relationship between the input data for the CNN model and the prediction term for an eruption. The muograms used in this study were plots of the daily muon count (observation period: 00:00:00 to 23:59:59). We inputted muograms obtained for seven consecutive days into the CNN to compute the probability of eruptions for the eighth day, called the “prediction day.” The number of muograms was determined experimentally. If at least one eruption occurred at Showa crater during the prediction day, the day was labeled as an eruption. The eruption times were based on the data from the website of Kagoshima Meteorological Office, Japan (https://www.jma-net.go.jp/kagoshima/vol/kazan_top.html, in Japanese). The total number of eruptions during the prediction day was not referred to. From the above, the CNN model predicts whether an eruption will occur at Showa crater on the eighth day using muograms obtained for seven consecutive days.

Figure 2
figure 2

(A) Relationship between input data for CNN model and prediction term for eruption, (B) configuration of CNN model. Muograms are plotted in a color scale.

We used 464 sets of seven consecutive daily muograms and a label of the prediction day, which were obtained between 7 November 2014 and 12 May 2016. We excluded seven consecutive daily muograms that include unobserved periods due to maintenance of the MOS. We split the dataset into three subsets: a training set, a validation set, and a test set. The training set was used to train the model and included 382 sets of muographic data (prediction days, 20 November 2014–28 January 2016; number of eruption days, 191). The validation set was used to calculate the evaluation criterion of the hyperparameter tuning and includes 40 sets of muographic data (prediction days, 8 February 2016–18 March 2016; number of eruption days, 20). The test set was used to evaluate the best model throughout the hyperparameter tuning, which includes 42 sets of muographic data (prediction days, 25 March 2016–12 May 2016; number of eruption days, 21). During the period of the three datasets, 1,439 and 10 eruptions respectively occurred at the Showa and Minamidake craters.

Figure 2(B) shows the configuration of our network. Our network consisted of one to four convolutional layers, one fully connected layer, and one output layer with two units with softmax activation. We employed a rectified linear unit (ReLU) function24 as the activation function for all layers except the output layer. Batch normalization25 was performed before each ReLU function. The dropout strategy26 was adopted for all layers except the output layer to avoid overfitting. We utilized the Adam method27 to optimize the network weights. The training procedure of the CNN model is described in Methods.

We compared the following four input regions:

  • Showa crater region: 165 mrad < θ <297 mrad, −66 mrad < ϕ <66 mrad (5 × 5 segments)

  • Minamidake crater region: 198 mrad < θ <330 mrad, −330 mrad < ϕ <99 mrad (5 × 8 segments)

  • surface region: 33 mrad < θ <198 mrad, 132 mrad < ϕ <264 mrad (5 × 5 segments)

  • all segments: 33 mrad < θ <330 mrad, −462 mrad < ϕ <462 mrad (10 × 29 segments)

where θ and ϕ indicate the elevation angle and azimuth angle, respectively. The Minamidake crater region included both the Minamidake craters A and B. The surface region included 5 × 5 segments of the mountain surface excluding the three craters as a baseline. The values of the segments outside the volcano were set to 0. For standardization, each segment value was multiplied by 0.001. By assuming the average rock thickness of 1 km in these regions, ~50 and ~90 muon events/day were respectively expected in the Showa and Minamidake crater regions.

Figure 3 shows an example of three consecutive daily muogram images used for the training set. Figure 4 shows the relative muon counts averaged over 15 events for the training set that were selected to satisfy our condition: there is no eruption at least two days before the eruption, where the “relative muon count” is obtained by dividing by the average of daily muon count acquired during the period of no eruption (from 1 November 2015 to 30 November 2015). The four plots in this figure correspond to the data acquired from the aforementioned four regions ((A) Showa crater region, (B) Minamidake crater region, (C) surface region, and (D) all segments). From Figs. 3,4(A), the muon count of the Showa crater region tended to decrease on the day before the eruption.

Figure 3
figure 3

Example of three consecutive daily muograms (observed from (A) 25 July 2015 00:00:00 to 23:59:59, (B) 26 July 2015 00:00:00 to 23:59:59, (C) 27 July 2015 00:00:00 to 23:59:59). The angular resolution was 33 mrad per segment and the data shows 14 × 29 segments. Muograms are plotted in a color scale (range: 0–10). The segment outside of the volcano was set to 0. White dotted squares indicate the Showa crater region. During the observation period, an eruption occurred at 27 June 2015 01:53:00.

Figure 4
figure 4

Averaged relative muon counts for each input region ((A) Showa crater region, (B) Minamidake crater region, (C) surface region, (D) all segments). Error bars represent standard deviations.

For comparison, we also evaluated three types of classifiers: a simple rule-based method, a support vector machine (SVM) model using the radial basis function (RBF) kernel28, and a neural network (NN) model. The simple rule-based method classifies an eruption occurring on at least four out of seven consecutive days as an eruption. This threshold was selected to maximize the accuracy of the training data. The SVM model and the NN model were applied only to the Showa crater region. For the feature values of the two models, we compared 175 segment values (25 segment values × 7 days) and seven summed values (the summation of 25 segment values for each day). We selected the seven summed values as the feature values with the best performance. The training procedures of the SVM model and the NN model are described in Methods.

We evaluated our method using receiver operating characteristic (ROC) analysis29,30, and the area under the curve (AUC) was calculated, which ranges between 0.0 and 1.0, with values of 0.5 for random classification and 1.0 for perfect classification. If the AUC is less than 0.5, the classification results are meaningless. Figure 5 shows the ROC curves for the test set for each input data. As shown in Fig. 5, the AUC values were 0.726 for the Showa crater region, 0.678 for the Minamidake crater region, 0.444 for the surface region, and 0.544 for all segments. Figure 6 shows the ROC curves for the test set for the SVM and NN models of the Showa crater. As shown in Fig. 6, the AUC values were 0.569 for the SVM model and 0.499 for the NN model.

Figure 5
figure 5

ROC curves for test set for each input data ((A) Showa crater region, (B) Minamidake crater region, (C) surface region, (D) all segments). The circle indicates the cutoff point chosen using Youden’s index. AUC is the area under the curve.

Figure 6
figure 6

ROC curves for test set for each machine learning model of the Showa crater region ((A) SVM, (B) NN). The circle indicates the cutoff point chosen using Youden’s index. AUC is the area under the curve.

Table 1 shows prediction performance at the optimal cutoff point for each input data. The optimal cutoff point of the ROC curve was chosen using Youden’s index31. The CNN model of the Showa crater showed the highest accuracy and specificity. In contrast, the SVM model of the Showa crater showed the highest sensitivity.

Table 1 Prediction performance at the optimal cutoff point for each input data.

Discussion

We have shown that our method may achieve moderate performance for day-level eruption forecasting at the Showa crater of Sakurajima volcano. The AUC, accuracy, and specificity were highest when the input to the CNN model was limited to the segments of the Showa crater region. The sensitivity was highest when the input to the CNN model was the all segments. In addition, the muon count of the Showa crater region tends to decrease before the eruption, which was considered to be due to the plugging of a magma pathway by magma deposits on the crater floor. These results suggest that the muographic data of the segments around Showa crater contributed to the forecasting performance. By contrast, the AUC was less than 0.5 when the input to the CNN model was from the segments of the surface region. Consequently, muograms have potential use for eruption forecasting of a volcano by analyzing temporal changes in its internal structures.

The CNN model of the Showa crater region was superior to the SVM model and the NN model in the AUC, accuracy, and specificity. In the CNN model, the convolutional layer works as the feature extractor, extracting local features of each layer. In addition, a deeper convolutional layer can detect more complex features. These roles of the CNN model make it more promising than other machine learning models.

The Showa crater region was limited to 5 × 5 segments since the angular resolution of the muography observation system was 33 mrad per segment. To overcome this problem of low resolution, a high-definition muography observation system was developed, with which muograms have acquired since January 20176. The angular resolution of the high-definition muography observation system is 2.7 mrad per segment, and it can be expected to image more detailed internal structures of the volcano. We plan to investigate the use of high-resolution muograms when a sufficient number of muograms are collected. We also plan to investigate the eruption forecasting of Minamidake craters, because most of the eruptions have occurred from these craters since November 2017.

The consecutive muographic images are time series data. Recurrent neural networks (RNNs), especially long short-term memory (LSTM)32, are an effective model for processing time series data33. A combination of CNNs and RNNs is expected to improve the forecasting performance. We plan to investigate their combination in our future work.

At Sakurajima volcano, other observation data, such as volcanic earthquake, volcanic tremor, tilt, and GPS data, are reported on the website of Japan Meteorological Agency (http://www.data.jma.go.jp/svd/vois/data/tokyo/STOCK/bulletin/index_vcatalog.html, in Japanese). However, these data are only available until April 2016 at the time of writing. The forecasting performance may be further improved by adding these observations to the input of the CNN.

There are two limitations to this study. First, in the labeling of the eruption day, we have excluded ten eruptions that occurred at the Minamidake craters during the period of analysis. This is because during the training data period, only two eruptions occurred at the Minamidake crater, both of which were erupted at the Showa crater on the same day. Second, an estimation of the uncertainty or confidence in the output decisions of our model was not carried out. For practical eruption forecasting using our method, it is crucial to estimate the uncertainty or confidence of the model34.

Methods

Training of CNN

In the training of the CNN, the optimization of numerous hyperparameters has a strong effect on the performance of the CNN model. Strategies for hyperparameter optimization include a grid search, random search35, and the Bayesian optimization (BO) algorithm36. BO is a framework for the optimization of black-box functions whose derivatives and convexity properties are unknown. BO is expected to optimize hyperparameters more efficiently than a random search.

In this study, we carried out 200 trials of hyperparameter tuning with the BO algorithm. The tuned hyperparameters of the CNN were the number of filters of each convolution layer (2c, c = 2–6), the number of units of the fully connected layer (2f, f = 1–7), the batch size, two parameters of the Adam method (α = 10–3–10–6, β1 = 0.9–0.99), and the ratio of dropout (0–0.5). We utilized the AUC of the ROC curve as an evaluation criterion for hyperparameter tuning. The numbers of maximum epochs and the patience of early stopping37 were set to 200 and 20, respectively.

The CNN model was implemented using Keras38 version 2.2.4 with TensorFlow39 version 1.10.0 backend. We trained the network on a GeForce GTX TITAN X (NVIDIA Corporation, Santa Clara, CA) graphics processing unit (GPU) with 12 GB memory. Table 2 shows the selected sets of hyperparameters of the CNN model for each input data.

Table 2 Selected sets of hyperparameters of the CNN model for each input data.

Training of SVM

In the training of the SVM model using the RBF kernel, we also carried out 200 trials of hyperparameter tuning with the BO algorithm. The tuned hyperparameters of the SVM model using the RBF kernel were the regularization parameter (C = 2−5–215) and the kernel parameter (γ = 2−15–23)28. We utilized the AUC of the ROC curve as an evaluation criterion for hyperparameter tuning.

The SVM model was implemented using scikit-learn version 0.21.3. The selected hyperparameters of the SVM model were C = 0.0659 and γ = 7.237.

Training of NN

In the training of the NN model, we also carried out 200 trials of hyperparameter tuning with the BO algorithm. The tuned hyperparameters of the NN model were the number of hidden layers (1–3), the number of units of each hidden layer (2h, h = 2–8), the batch size, two parameters of the Adam method (α = 10−3–10−6, β1 = 0.9–0.99), and the ratio of dropout (0–0.5). We utilized the AUC of the ROC curve as an evaluation criterion for hyperparameter tuning. The maximum number of epochs and the patience of early stopping were set to 100 and 10, respectively.

The NN model was implemented using Keras version 2.2.4 with TensorFlow version 1.10.0 backend. We trained the network on a GeForce GTX TITAN X GPU with 12 GB memory. The selected hyperparameters of the NN model were as follows: number of hidden layers, 3; number of units of first hidden layer, 32; number of units of second hidden layer, 128; number of units of third hidden layer, 16; batch size, 32; α of Adam method, 0.000287; β1 of Adam method, 0.983763; ratio of dropout, 0.442019.