Earthquake source characterization by machine learning algorithms applied to acoustic signals

Underwater seismic events generate acoustic radiation (such as acoustic-gravity waves), that carries information about the source and can travel long distances before dissipating. Effective early warning, emergency response, and information dissemination for earthquakes and tsunamis require a rapid characterisation of the fault properties: geometry and dynamics. In this work, we analysed hydrophone recordings of 201 earthquakes, located in the Pacific and the Indian Ocean, by employing acoustic signal processing and classification methods. The analysis allows identifying the type of earthquake (i.e. slip type, magnitude) and provides near real-time estimation of the effective properties of the fault dynamics and geometry. The results were compared against values reported by the Harvard Global Centroid Moment Tensor catalog (gCMT), revealing statistical significance between the extracted acoustic properties used to feed machine learning algorithms and the predicted slip and magnitude values.

Slip type classification. The primary objective of the classification is to identify the existence of significant vertical motion components in the studied tectonic events. In addition, as a secondary objective, we study the characterization of the type of vertical motion related to the studied earthquakes. Two classification approaches were taken to identify the slip type associated with the tectonic events that generated the acoustic signals composing the dataset: binary and multi-class.
Binary classification. The dataset is divided into two classes. The first class is composed of signals related to tectonic events with vertical motion components (mostly dip-slip), whereas the second is composed of events with relatively small or no vertical motion components (mostly strike-slip). The differentiation between vertical and horizontal events is essential when applying the inverse problem model developed by Gomez and Kadri 3 , which is designed to work for vertical fault displacements. The division of the dataset was made based on source faulting solutions provided by the global CMT catalog 28,29 , resulting in a set distribution of 86 strike-slip earthquakes, which are considered to have mainly horizontal motion component (42.79%) and 115 earthquakes with relevant vertical motion component (57.21%). In that sense we insure that the dataset is balanced between events with significant vertical motion components and mainly horizontal slip events.
In order to characterize the acoustic signals, we applied four different methodologies for feature extraction that are described in detail in the "Signal vectorization (feature extraction)" subsection in the "Methods" section. In addition, two classification algorithms were applied along with each feature extraction methodology www.nature.com/scientificreports/ used on the dataset: Random Forest Classifier (RFC) and Support Vector Machines (SVM). In order to test the ML algorithms, we used 10-fold validation technique and 5-fold hyper-parameter grid search, for more details see the 'k-fold and grid search' subsection in the "Sensitivity analysis" section in the Supplementary materials. Results obtained for every fold were averaged to provide final accuracy and standard deviation estimates, see Table 1. Moreover, confusion matrices were computed in order to provide classification accuracy for each considered slip type, see Fig. 2. In all cases, there is a higher performance of the ML algorithms for the identification of 'vertical earthquakes' ('1') compared to the identification of 'horizontal earthquakes' ('0'), see Fig. 2. This behaviour could be associated with the higher availability of 'vertical earthquakes' in the studied dataset. The overall accuracy of both ML algorithms applied to each feature set is over 70% for every considered scenario. Both ML algorithms show similar accuracy for binary classification of the dataset.
Multi-class classification. Secondary results are presented in this subsection, where the relations between earthquake slip types and their associated acoustic signals are further studied. Here, the dataset was split into three classes 4 based on the source faulting solutions provided by the global CMT catalog 28,29 . The classes are strike-slip ('0'), thrust or reverse ('1') and normal ('2'). Thus, the 'vertical' class, defined in the previous subsection, was further subdivided into two classes: thrust and normal. This classification led to a dataset distribution composed of 86 strike-slip (42.8%), 62 reverse (30.8%) and 53 normal earthquakes (26.4%). The signal vectorization techniques used in binary classification have been utilised in the section. RFC and SVM were applied, where a 'One-versus-all' 30 technique was used, generating as many binary classifiers as label types and testing every class against the rest. The resulting classification accuracy, standard deviations and confusion matrices are provided in the Supplementary materials, 'Multi-class classification' section. The application of RFC to the feature set '3' , where only features obtained by applying the DWT are considered, led to the highest observed accuracy results, with an average accuracy for 10-fold validation technique of 64.14% and 8.99% standard deviation, see 'Multi-class classification' section in the supplementary materials. The normalized confusion matrices suggest that the algorithms classified the majority of the strike-slip and thrust earthquakes accurately (> 70%), though failed to identify most of the normal earthquakes (< 30%). The low classification accuracy of normal slip type events is potentially influenced by a low presence of acoustic signals related to this type of earthquake in the dataset, leading to an imbalanced dataset 31 . Nevertheless, the overall classification accuracy (> 60%) for most of the cases reveals statistical significance between the features and the slip types.
Magnitude regression. In this section, we estimate the magnitude of tectonic events by analysing acoustic signals and applying ML algorithms. Support Vector regressor (SVR) and Random Forest regressor (RFR) were applied to each of the four different feature sets extracted from the recorded signals that compose the dataset. Additionally, we used 10-fold validation technique and 5-fold hyper-parameter grid search. The magnitudes (Mw) of the events were obtained from the global CMT catalog 28,29 , see Fig. 3. Note that since the frequency of occurrence of an Earthquake drops logarithmically with the magnitude, having a perfectly balanced dataset is rather challenging. The magnitude frequency for each type of earthquake is graphically analysed in the Supplementary materials, 'Multi-class classification' section.
The sum of squared errors (SSE) associated with the results delivered by the ML regression algorithms is calculated by where n is the size of the test set, y i are the actual values in the test set and f (x i ) are the predictions made by the ML algorithms.
The calculated SSE values for each fold were averaged to provide final estimates for the performance of the regression algorithms, see Table 2. The SSE, in Eq. (1), was applied using the average of the training set as predictor f (x i ) , resulting in 13.61, which is about twice the SSE values obtained using the ML models predictions presented in Table 2. This indicates that the model delivers more accurate predictions than the mean value of the training set. Additionally, the R-squared ( R 2 ) estimator, which represents the part of the variance for a dependent variable that is explained by the independent variables in a regression algorithm, is computed by  where S res is the sum of squares of the residual errors and S tot is the total sum of the errors. The lowest errors were obtained by applying SVR to the set of features '2' , which is composed of a combination of temporal, statistic and cepstral features. The observed R 2 scores lie around 0.5, suggesting that the regression algorithms in combination with the extracted features were able to explain some relations between the extracted signal features and the corresponding tectonic event magnitudes, see Table 2. The calculated and actual values for the magnitudes of the tectonic events are graphically compared in the 'Regression accuracy distribution' subsection in the 'Sensitivity analysis' section in the 'Supplementary materials' . It was observed that the estimations related to magnitude extremes of the dataset ( Mw < 6 and Mw > 8 ) have higher associated errors.
Earthquake case studies. To demonstrate the applicability of the developed methodology, we have chosen five real case scenarios independent from the dataset used to train the ML algorithms: The selected earthquakes have a wide range of magnitudes (6.7 to 8.1 Mw). Stronger earthquakes were not considered due to the small number of available recordings associated with them. The studied earthquakes in this section are located in the Pacific ocean and their estimated properties are compared against data extracted from the gCMT 28,29 . Additionally, reports released by the National Oceanic and Atmospheric Administration (NOAA) show that the selected earthquakes triggered tsunamis.
The acoustic signals emitted by the earthquakes were recorded by the IMS hydrophone station 'HA11' situated in the middle of the Pacific ocean and deployed by CTBTO. These earthquake scenarios were chosen as they do not have abrupt bathymetry changes in the paths between their epicentres and the 'HA11' hydrophone station, reducing the inverse problem uncertainties. The process taken for retrieving the type of slip, magnitude and effective properties of a tectonic event is detailed in this section for the case of the introduced event 16/02/2015. Then, the results obtained for the remaining four earthquakes are provided in Tables 5 and 6.
The gCMT 28,29 reported an earthquake of magnitude 6.7 Mw and coordinates Lat=39.78, Lon=143.22 (Offshore, near the east coast of Honshu), that occurred on 16/02/2015 at 23:06 UTC. The epicentre location is about 2 km deep underwater, with a hypocenter situated 22.2 km under the ground-water interface. The half duration of the event is 5.7 seconds that was catalogued as a thrust earthquake with slip angles: strike = 182° and 26°; dip = 18° and 73°; slip = 68° and 97°. Moreover, the studied event triggered a small tsunami with a wave height of 20 cm, recorded along the coast of Iwate (https:// www. ngdc. noaa. gov/ hazel/ view/ hazar ds/ tsuna mi/ event-search).
The distance between the analysed recording hydrophone and the epicentre of the tectonic event is approximately 3200 km, resulting in a calculated travel time for the acoustic waves of 35 mins before reaching 'H11N1', see The acoustic disturbance was isolated and the four proposed sets of features extracted. The dataset composed of 201 acoustic signal recordings associated with tectonic events was used to train the classification and regression algorithms, which estimated the slip type of the studied event with 100% accuracy and the magnitude with an associated error lower than 5%, see Tables 3 and 4.
The earthquake's type of motion and magnitude estimations delivered by the ML algorithms, reported in Tables 3 and 4 , are important input parameters for the inverse problem model, see 'Inverse problem model' section. When the earthquake is classified vertical (binary classification) the inverse problem model can be applied. The potential ranges for the effective fault size and dynamics are calculated by using empirical relations that relate The length of the effective displacement 2L was estimated from Fig. 9 in Wells and Coppersmith 20 , the area from figure 16 20 , allowing the further calculation of the width 2b. The maximum vertical displacement was estimated from Fig. 11 in Wells and Coppersmith 20 . Note that the predictions for the characteristics of the earthquake Table 3. Classification results for the study case using four different feature sets. For the 3-type classification '0' stands for horizontal, '1' for thrust and '2' for normal. For the binary classification '0' represents horizontal, '1' for vertical.   where SRL stands for surface rupture length, RA for rupture area (Table 2A Wells and Coppersmith 20 ), and MD for maximum displacement (Table 2B Wells and Coppersmith 20 ).
The applied parameterizations provided a single value for each effective fault property, which was divided by 2 and multiplied by 2.5 to define the limits for the input ranges of the inverse model. These coefficient values are set after visual inspection of the information published by Wells and Coppersmith 20 in order to simulate the scatter found in the data 3 .
In the spectrogram associated with the pressure signal, we identified maximum and minimum limits for the potential values of the frequency distribution of the first acoustic mode at 0.8 and 10 Hz respectively. With this information, the algorithm identified four potential first mode frequency distributions that lie within the established frequency range. Then, the relevant part of the acoustic disturbance was selected by inspection of the short-time energy distribution. Finally, the model was run producing a set of 40 solutions (a set of 10 solutions was calculated for each potential orientation of the fault) for each property of the effective ground uplift associated with the tectonic event: L, b, T and uplift speed W 0 , see Both approaches for the input of the model led to sets of results that lie in the same order of magnitude. The calculated effective half length of the fault, in the case of the magnitude of the event not taken into account as input, is significantly larger than for the case where the magnitude is used as input. For the remaining four introduced earthquakes, only the '1st' and '2nd' sets of features are used along with RFR and SVR, since they led to the best regression accuracy results, and the obtained magnitude results averaged to provide a single final value. The '2nd' and '3rd' sets of features led to the highest classification accuracy. Thus, they are applied and the resulting classifications compared, see Tables 5 and 6. Note that only the binary classification result is used as input for the inverse problem model. The average absolute binary classification error and the absolute mean regression error are 0% and 2.382 %, respectively.
Synthetic signal analysis. In this section, we report on the performance of the application of classification algorithms on synthetic pressure signals generated by Eq. (4) found in Mei and Kadri (2018) 1 . The applied ML algo-  www.nature.com/scientificreports/ rithms were trained with real earthquake acoustic signals. The bottom pressure signals induced by the acoustic waves in the far field are described by where ρ is the water density, A is the two dimensional envelope 1 , k is the wave number and ˆ is the frequency. Note that only the pressure induced by the first acoustic mode is taken into account, as it carries most of the energy and information about the source 1 . The analytical solution used to generate the synthetic signals takes simplifications such as constant speed of sound or rigid and flat seabed which are discussed in other studies 1,3 . The slender fault properties and relative coordinates used to calculate the pressure signals by Eq. (4) were randomly generated within the ranges observed by Wells and Coppersmith (1994) 20  Here, 20 synthetic signals were generated and windows of 300 seconds extracted after the arrival of each pressure disturbance to the virtual hydrophone. Then, the four types of signal vectorization considered in this study were performed and binary classification was applied with 5-fold hyperparameter grid search. As a result, the RFC and SVM algorithms identified the synthetic signals as incoming from vertical motion earthquakes ('1') with an accuracy of 100%, as expected.
We studied the effects of adding and shuffling different numbers of synthetic signals into the dataset (composed only by real earthquake recordings) used to train the classification algorithms, for more details see 'Machine learning application on synthetic signals analysis' subsection in 'Sensitivity analysis' section in the 'Supplementary materials' . The overall accuracy of the classification algorithms increased due to the addition of synthetic signals to the training dataset. However, when the amount of added synthetic signals became ≈ 10% (20 synthetic signals in the studied case) of the dataset size, a trend of increased bias was noticed, leading to a higher incorrect classification of the horizontal slip events ('0'). Thus, it is recommended to include a number of synthetic signals smaller than 10% of the total dataset size in order to optimise the classification accuracy for both tectonic event slip types.

Discussion
We applied a set of techniques capable of analysing acoustic pressure signals induced by underwater earthquakes and calculated the effective fault size and dynamics in almost real-time. To fulfill this goal, we studied a dataset composed of 201 earthquake signals recorded by the IMS hydro-acoustic network. Furthermore, we compared four different methodologies to extract relevant features from acoustic signals incoming from submarine earthquakes, based on statistical moments, time series analysis, power spectrum analysis, wavelet transform coefficients analysis and cepstral coefficients were considered and compared. Along with the vectorization methodologies, we applied two classification ML algorithms (RFC and SVM), which were able to discriminate vertical motion events with over 70% classification accuracy. Amongst the tested methodologies, the wavelet transform feature extraction technique in combination with SVM led to the highest classification results accuracy for both binary and multi-class scenarios.
Regarding the three-type classification, included as a secondary result, there is a low classification of 'normal' events, which can be caused by an unbalanced dataset 31 , where the balance between 'horizontal' and 'vertical' events was prioritised, leaving around half of the set to be split into the two new classes (thrust and normal). The inclusion of more signals induced by 'normal' earthquakes or the use of penalised ML algorithms could yield higher accuracy results. Furthermore, the application of feature selection algorithms instead of using feature sets based on previous studies has the potential of improving the ML algorithms accuracy.
Additionally, regression ML algorithms were applied to the vectorized signals dataset to estimate the magnitudes of the associated tectonic events. The ML algorithms delivered better predictions than the mean value of the dataset, which was confirmed by the SSE values. It is remarkable that the precomputed vectorized dataset along with the ML algorithms take less than one second on a standard desktop machine to deliver the source magnitude and slip type estimations. Finally, the magnitude and slip type retrieved by the ML algorithms can be used to feed an inverse problem model to perform real-time calculations of the fault effective size and dynamics. In this study, the depth dependence of the classification accuracy has not been analysed (only shallow earthquakes have been selected to reduce uncertainties) which is intended to be done in the future in order to expand the studied dataset and potentially improve the accuracy of the model.
The size of the dataset is a limitation. Only earthquakes that meet specific conditions are considered, reducing significantly the cases that serve our purpose. Thus, this study is presented as a proof of concept to show the applicability of a combination between ML algorithms and semi-analytical solutions to infer the properties of submarine tectonic events from acoustic radiation. Another issue that we would like to study in the future is the relations between the magnitude prediction results and the type of studied earthquakes, where possible patterns might shed light on the process understanding.

Methods
Data and instrumentation. Instrumentation. We used data from three hydrophone stations deployed by CTBTO: HA01 (Cape Leeuwin), HA08 (Diego Garcia) and HA11 (Wake Island). Each station consists of two triplets except HA01 which has a single triplet 27 .
The hydrophones are suspended at a depth corresponding to the SOFAR channel axis and anchored to the seabed via a riser cable, which is kept under tension by a sub-surface buoy 32 . The recordings used in this study are extracted from the International Monitoring System (IMS) database, which are originally recorded by the Dataset. Long distances from hypocenter to epicentre can induce distortion and attenuation on seismic waves leading to higher uncertainties in measurements. In order to minimise these effects, we considered only acoustic signals associated with shallow earthquakes. Different values for the maximum hypocenter depth below the seabed are used to define shallow earthquakes in the literature, such as 60 km 5,33 or 100 km 7 . We only included earthquakes with hypocenter located at less than 60 km deep under the sea bed in the dataset. The dataset was built with 201 acoustic recordings induced by 'shallow' tectonic events, listed in the 'List of earthquakes' section in the 'Supplementary materials' . The data was provided by CTBTO.
To minimise diffraction effects only earthquakes with shortest transects, that do not cross through lands, were analysed, see Fig. 6. In addition to the previously mentioned constraints, only underwater earthquakes are studied, which significantly narrows down the search.
The earthquake's source types of slip and magnitudes were labelled based on data reported by gCMT 28,29 . The slip type labelling of the dataset depends on the slip angles of the fault planes. For slip angles 0° and 180° the faults were classified as pure strike-slip, whereas for 90° and − 90°, faults were classified as pure dip-reverse and pure dip-slip normal, respectively (± 20°).

Digital signal processing methods. Signal vectorization (feature extraction).
Signal vectorization is the process of reducing the size of the acoustic data set by determining category variables that define the sound type or identity of the sound source. It is important to decrease the signal dimensionality, as the training dataset grows exponentially as a function of the number of variables in the feature vector 8 .
Previous studies on ML approached the classification of CTBTO acoustic signals by operating on features automatically extracted by the organization 10,25 . However, the mentioned studies had to handle missing values for some features 9 . To overcome this difficulty, raw acoustic data were analysed and a signal processing algorithm, for signal vectorization was developed, ensuring that there were no missing values in the feature vectors. It has been reported that a single feature cannot train the classifiers efficiently 8 . Thus, we considered and compared five types of features: temporal (obtained directly from the time series) 8-10 , spectral (obtained from the power spectrum) 8-10 , cepstral [8][9][10]12,34 , statistical (statistical moments applied to the times series) 9,10 and wavelet transform type 35 .
Inspired by previous studies [8][9][10]35 , four different sets of features were built and tested along with the considered ML algorithms. The first and second sets consist of a combination of four types of features (temporal, spectral, statistical and cepstral); the third set is composed only by wavelet transform extracted features; and the fourth set consists of cepstral features only, as shown in Table 7. We set N = 20, 000 samples (160 s) as the maximum considered half window length for feature extraction. This choice was made after observing that it exceeds the duration of most signals in the dataset. Several values of N were tested, ranging from 2500 to 20,000 samples, for more details see the 'K-fold and grid search' subsection in the 'Sensitivity analysis' section in the 'Supplementary materials' . The minimum tested half window length was 2500 samples, since it was intended to only analyse the frequency behaviour of the signals down to 0.1 Hz.
Frequency behaviour analysis and time series features. In order to decompose the signals in different frequency bands and study their behaviour at different points of the frequency spectrum, we applied Butterworth bandpass filters, which have been reported to have good performance for classification purposes 37 . Then, the extracted features are statistical moments applied to the filtered time series such as standard deviation, kurtosis, skewness, maximum amplitude and zero-crossing rate. Additionally, the short-time energy distribution was computed for each band, with a five seconds window width, and the maximum and total short-time energy were calculated. Several potential divisions of the frequency spectrum (see Fig. 7) were analysed, tested and compared, for more details see the 'K-fold and grid search' subsection in the 'Sensitivity analysis' section in the 'Supplementary materials' .
Based on the fact that earthquakes with vertical motion components excite lower frequencies due to the compression of the water 1 , we performed more subdivisions at lower frequencies than at higher frequencies. Moreover, slow events show lower frequency excitation, in the range of 0.1 to 10 Hz 7 . Some of the tested frequency subdivision approaches taken in this study were inspired by previously published studies 10,14 , see Fig. 7.
We analysed the introduced subdivisions of the relevant part of the frequency spectrum and the optimal window size (2N) by running the classification models SVM and RFC on the 2-type class labelled dataset. In this analysis, only the set of features '1' was used, along with every combination between the considered window sizes (2N) and spectrum division approaches. It can be seen that the standard deviations lie below 15% for SVM and 12% for RFC on the tested cases. In addition, the variations on accuracy along with the different considered scenarios are lower than 3%. Finally, we selected 15000 samples as the half window size (N) and the eight frequency band spectrum division approach for the final set-up. This choice was made based on the results of the carried sensitivity analysis, for more details see the 'K-fold and grid search' subsection in the 'Sensitivity analysis' section in the 'Supplementary materials' .
Spectral and cepstral features extraction. The signals were further processed to extract spectral and cepstral features. The cepstrum can be calculated by taking the inverse Fourier transform after applying a natural logarithm to the Fourier transform of a signal. The logarithm maps convolution in the time domain to addition in the frequency domain.
x 2 (n) Table 7. Studied feature sets, tested on the classification and regression algorithms. www.nature.com/scientificreports/ While for speech the optimum window length for framing is [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32] ms, for acoustic sounds in the ocean, the window length may be different. Thus, the signal was sliced into frames of ten seconds duration (in order to be able to capture frequencies down to 0.1 Hz). We set 50% overlap 12,34 amongst the frames and Hamming window was applied 38 . Then, Fourier transform was computed and the power spectrum for each frame was calculated by where NFFT is the transform length in samples of the signal used by the fast Fourier transform, which in this case is 4096 in order to provide enough frequency resolution. Because of the nature of the FFT algorithm, NFFT was chosen to be multiple of two to make the algorithm more efficient.
The next step was to compute filter banks using triangular filters, which were applied on a scale inspired by the Mel-scale to the power spectrum in order to extract frequency bands. The Mel-scale aims at mimicking the nonlinear human ear perception of sound. Cepstral coefficients were used in the literature for the classification of sound sources in underwater environments due to their robustness to noise [8][9][10]34 . Note that the Mel scale was designed for higher frequencies than the bands we are interested in, i.e. 0.1-10 Hz. Therefore, we modified the constants in the conversion formula to have an almost linear mapping from Hertz (f) to Mel frequency (m) in the range of 0.1 to 10 Hz and logarithmic mapping over 10 Hz, see Eq. (7). The conversion between Hertz and Mel frequency is done by The new constants are obtained from where f 0 was chosen to be 7 Hz and f = 10 Hz, thus, below 10 Hz the relation between Mel frequency and frequency in Hz is almost linear.
The applied triangular filters are highly correlated due to the overlap, and thus, we used the discrete cosine transform (DCT) to decorrelate the filter coefficients 12 . It is shown in the 'Cepstral coefficients analysis' subsection in the 'Sensitivity analysis' section in the 'Supplementary materials' that 12 cepstral coefficients capture most of the information carried by the signal. Finally, statistical moments (mean, maximum, kurtosis, skewness and variance) were applied to each coefficient band to reduce the number of features. The spectral features, listed in Fig. 7, were extracted from the one-sided power spectrum computed on the calibrated original signals.
Wavelet transform parameters analysis. The power distribution of the signals along with different frequency bands can be analysed by applying the discrete wavelet transform (DWT) 39 , of the form www.nature.com/scientificreports/ where a is the scaling parameter, b is the translational parameter and ψ is the mother wavelet. Here, a given signal is projected onto a space defined by a set wavelets, which are function of frequency and time 40 , Several studies approached the extraction of features by wavelet transform algorithms for different purposes, such as earthquake magnitude prediction by seismic waves analysis 18,41 or source type classification of acoustic signals 24,35,39,42 . Even though numerous wavelet bases exist, we tested and compared only two discrete wavelets, Daubechies and Symlet, being two of the most popular wavelets in signal processing. Furthermore, the order of the wavelets was also analysed, which indicates the number of vanishing moments and is related to the approximation order and smoothness of the wavelet. After applying DWT, n sets of detail coefficients and one set of approximation coefficients ( n + 1 times each feature) were produced. Note that n = 6 levels, has been used in previous studies 35 . In order to analyse the sensitivity of the ML algorithms to variations in DWT parameters, we tested a different number of levels [4][5][6][7][8] and wavelet orders [2][3][4][5][6][7][8], see 'Sensitivity analysis' in 'Supplementary materials' . We found that the accuracy amongst the considered scenarios has a deviation of less than 5%. Nevertheless, we decided to use Symlet wavelet of order eight with six levels of coefficients as the final setup, since it provides a good balance between computational efficiency and accuracy.
Finally, after applying DWT, statistical moments were calculated for each extracted coefficient band or level ( n + 1 ): standard deviation, average short-time energy, power ratio between the first coefficient band and every other band and zero-crossing rate 42 , see Table. 7.
Machine learning (ML) algorithms. In essence, ML algorithms learn from data using probability theory and can be grouped into two main categories: supervised learning (labelled dataset) and unsupervised learning (unlabelled dataset). In this study, we applied supervised learning, which can be further subdivided into classification and regression algorithms based on whether the target outputs are categorical (classification for slip type) or quantitative (regression for magnitude) 43 . In particular, two ML algorithms were explored, SVM and RFC [8][9][10]12 .
The first algorithm, SVM, is a classification technique that can additionally be used for regression purposes 44 . It uses a convex cost function, always reaching the global minimum of the cost function 12 . It is able to perform reliable classifications even with small datasets 45 . SVMs differentiate classes by finding the hyper-plane that splits them and maximize the margin between the closest point of each class and the hyper-planes. Each signal is vectorized and described by a point in a n dimensional space and a cost function is fully specified by the subset of training examples called support vectors. The output of the SVM is a set of weights, which in combination will predict the value of the outcome. Because of the non-linear nature of the studied process, a non-linear kernel has been selected, 'Radial Basis Function' (RBF), being this a type of Gaussian kernel, that has been proven to provide good accuracy and efficiency 9 . SVM are commonly found in the literature to classify acoustic signals in the ocean [8][9][10][11]11,12,44,46 .
The second studied algorithm, RFC (introduced in 1995) 47 , is a technique based on decision trees that operate as an ensemble. It can perform classification and regression 48 by splitting a dataset into smaller data subsets, while an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. Decision nodes have two or more branches and leaf nodes represent a classification or decision. Each individual tree in the random forest delivers a class prediction and the class with the most votes becomes the model's prediction.
It is important to remark that, regularization (normalization) of the features was carried before the application of SVM, due to significant differences in the order of magnitude between the feature values. However, it is not necessary for the application of RFC. In addition, we used validation k-fold technique 12 (10-fold) in combination with grid search (5-fold) to identify the best model hyper-parameters and test the accuracy of each algorithm, for more details see 'k-fold and grid search' subsection in 'Sensitivity analysis' section in the 'Supplementary materials' . Finally, the accuracy of all fold tests was averaged and precision and standard deviation results were calculated. where φ is the velocity potential, the velocity field is defined by u = ∇φ . c is the speed of sound in water. With standard boundary conditions the problem of the propagation of acoustic waves induced by the vertical motion of a slender fault was constructed. At the surface, the pressure is assumed to be zero and uniform, On the seabed ( z = 0 ), a piston model simulates the vertical displacement of the fault by www.nature.com/scientificreports/ The effective earthquake is assumed to have rectangular slender shape, with a length of 2L and width 2b, where b/L ≪ 1 is the slenderness parameter.Due to the assumption of slender body, multiple scales theory can be applied and an analytical solution is reached [Eq. (6.1) in Mei and Kadri 1 ].

Inverse problem model.
The derived solution 1 comprises a summation of a countable infinity of acoustic modes. However, for brevity, only the bottom pressure due to the leading acoustic mode is considered, which is the most energetic 50 . The solution provides a relation for the bottom pressure induced by the vertical motion of the slender fault in the far-field.
Note that, the slender fault geometry and dynamics considered here represent an effective vertical motion caused by the more complex earthquake rupture dynamics. The fault is assumed to move vertically upwards with a constant speed W 0 , for a time duration 2T 1 . Thus, inferring the presence of vertical motion components in the studied tectonic events is of major importance for the application of the model.
An array with the potential combinations for x 0 and y 0 (orientation of the fault) can be defined considering that, the detected acoustic radiation induced by the earthquake can be triangulated by the hydrophone triplet to infer the distance to the source. The frequency distribution of the first acoustic mode is then calculated, which is a function of time t (relative to the eruption time t 0 ), depth h, and every potential orientation of fault ( x 0 and y 0 ). Only the obtained frequency distributions that lie within the range defined by visual inspection of the spectrogram are considered and define the number of sets of solutions provided by the model.
To identify the beginning and end of the disturbance in the recordings, the short-time energy distribution is analysed. Then, pressure points from the acoustic signal need to be chosen to retrieve the source properties 3 . Envelope tracking algorithms are applied since working with points close to the envelope reduces the associated uncertainties and the obtained pressure points are associated with the calculated potential first mode frequency distributions.
The magnitude estimated by the ML algorithms is used to generate the ranges that will feed the inverse problem model and confine the potential solutions for the effective geometry and dynamics of the tectonic event. Finally, the steps described in Gomez and Kadri 3 section 3, are taken for each of the selected frequency distributions to retrieve b, L, T and W 0 , see Fig. 8. Note that the calculation of ten solutions for each earthquake (12) ∂φ ∂z = W 0 τ (t) |x| < b, |y| < L 0 elsewhere , τ (t) = 1 − T < t < T 0 |t| > T .