Sharpen data-driven prediction rules of individual large earthquakes with aid of Fourier and Gauss

Predicting individual large earthquakes (EQs)’ locations, magnitudes, and timing remains unreachable. The author’s prior study shows that individual large EQs have unique signatures obtained from multi-layered data transformations. Via spatio-temporal convolutions, decades-long EQ catalog data are transformed into pseudo-physics quantities (e.g., energy, power, vorticity, and Laplacian), which turn into surface-like information via Gauss curvatures. Using these new features, a rule-learning machine learning approach unravels promising prediction rules. This paper suggests further data transformation via Fourier transformation (FT). Results show that FT-based new feature can help sharpen the prediction rules. Feasibility tests of large EQs (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M\ge$$\end{document}M≥ 6.5) over the past 40 years in the western U.S. show promise, shedding light on data-driven prediction of individual large EQs. The handshake among ML methods, Fourier, and Gauss may help answer the long-standing enigma of seismogenesis.

The first data transformation (Figs. 1B,C) converts the raw EQ catalog data (USGS 18 ) into new scalar features, denoted as spatio-temporal information index ( II ST ).The reference volume is defined as a discretized volume of the Earth lithosphere with increment of (longitude, latitude, depth), (� , �φ, �h) = (0.1 • , 0.1 • , 5 km) .For the convolution process, the geodetic coordinates, ( , φ, −h) (t)   i are transformed into the earth-centered rectilinear coordinate (x, y, z) (t) i 17 .Each observed EQ's moment magnitude M (t) i ∈ R[0, 10) are assumed to reside at x (t) i = (x, y, z) (t) i , following the "point source" concept.Physically, II ST (ξ j ) (t) quantifies the accumulated influ- ences of the adjacent EQs close to the jth reference volume center and of the past EQs up to the present time (t).The details of the data transformation from raw EQ catalog data into spatio-temporal information index generation are presented in "Methods".
The second data transformation is to convert the spatio-temporal IIs into the pseudo physics quantities.Amongst many physics quantities, the best-so-far set of pseudo physics quantities are identified as {released energy, power, vorticity, Laplacian} 17 .To be purely data-driven, no pre-defined statistical or empirical laws are used.Instead, a flexible function, called "link function (LF)", is used to learn expressions of the pseudo physics quantities.The best-so-far form of the pseudo released energy E (t) r (ξ j ) is identified as where θ (k,l) is the best-so-far free parameters of the associated LF L (k,l) .Here, L (k,l) takes II (t) ST (ξ j ; L k , T l ) as input and produces a smooth, nonlinear output.
L k (k = 1, . . ., n L ) is the spatial influence range that means the spatial proximity-dependent importance in the spatial convolution process.Similarly, T l (l = 1, . . ., n T ) is the temporal influence range meaning the temporal proximity-dependent importance in the temporal convolution process (see "Methods" for details).
Any mathematical form can be used as LF, and a simple yet general exponential form LF works well 17 , i.e., for the pseudo released energy where r (ξ j ) ∂t corresponds to the pseudo "power" and the pseudo "Laplacian" is cal- ∂h 2 , where ∇ g (.) means the spatial gradient with respect to the geodetic coordinate system ( , φ, h ).As shown in Ref. 17 , amongst many pseudo physics quantities and their combinations, ML selected out the four quantities-the released energy, power, the first vorticity term, and the first Laplacian term, ( ∂ 2 ), at least for the western U.S. region.Again, this selection is purely data-driven since ML simply seeks to find the best combination that can outperform other cases without any prejudice.The third data transformation is to convert the pseudo physics quantities into Gauss curvatures.At each depth, the distributions of the pseudo physics quantities constitute smooth yet complex surfaces.To effectively inform ML with surface-like information, the next data transformation focuses on Gauss curvatures 19 -consisting of two principal curvatures κ 1 and κ 2 (detailed calculation procedures are presented in Ref. 17 ).Using the Gauss curvatures near EQs, it is easy to quantify the distributions' shapes of the pseudo physics quantities.In Ref. 17 , these Gauss curvature-based coordinates may serve as a unique signature of individual extreme EQs.The coordinate vector K consists of the principal Gauss curvatures (κ 1 , κ 2 ) of four pseudo physics quantities at time t at a reference volume ξ j as where E, P, V and L stand for the pseudo released energy, the pseudo power, the pseudo vorticity's first term, and the pseudo Laplacian's first term, respectively, all being calculated at time t and the reference volume ξ j .
The fourth data transformation is to convert the time histories of Gauss curvatures into Fourier transform (FT)-based features (Fig. 1D).The inclusion of the FT-based new features has two reasons.First, FT-based features can easily convey temporal information of other features in terms of amplitudes and frequencies.The second reason is to leverage the strength of the Fourier series in representing general, complex functions.As shall be demonstrated in this paper, the inclusion of trigonometric functions (inspired by the Fourier series) in the prediction rules appears to improve and sharpen the prediction accuracy substantially.
Table 1 summarizes the key procedures of FT-based new feature generation using the time history of Gauss curvatures of pseudo physics quantities.The set of Gauss curvature-based coordinates at ξ j up to the present time t n = n × t is given by where t is the sampling interval, one day in this paper.
Regarding K(n; ξ j ) ∈ R (n×8) as a matrix, the m th column, denoted as k (m) (n; ξ j ) ( m = 1, . . ., 8 ), corresponds to the time series of a principal Gauss curvature of a pseudo physics quantity.For instance, k (1) (n; ξ j ) the 1st column of K(n; ξ j ) , means the time series of κ 1 of the pseudo released energy up to this time t n at a reference volume ξ j whereas the 8th column k (8) (n; ξ j ) means the time series of κ 2 of the pseudo Laplacian's first term. (1) www.nature.com/scientificreports/To generate the Fourier transform-based new features, this study performed the fast Fourier transform (FFT).The well-established library FFTW 20,21 is used to carry out FFT of the discrete time series of the Gauss curvaturebased features of the pseudo physics quantities, i.e., each column of K(n; ξ j ) .The FFT generates the resultant set F PSD consisting of the power spectral densities p PSD ∈ R n and the associated frequencies f ∈ R n .In short, K(n; ξ j ) → F(n; ξ j ) := {p PSD , f (1) , . . ., p (8)   PSD , f (8) }.Then, for each column, we can remove the DC component and sort the column vectors in descending order with respect to the magnitude of PSD.The resulting sorted set ( F ) from F(n; ξ j ) → F(n; ξ j ) := {p (1) PSD , f (1) , . . ., p (8) PSD , f (8) is the i th entity of the sorted column vector p (m) PSD in descending order.f (m) is the sorted frequency vector according to the p (m) PSD .To generate practically meaningful features, amongst many peaks in the power spectra, this paper extracted the top 10 amplitudes and the associated frequencies.FT-based new feature set is denoted as top , . . ., p (8) top , f There is no strict restriction to how many top amplitudes are selected.This paper adopts up to top 10 since it can encompass sufficient energy of the total energy of the input signal.For instance, Fig. 2A shows that the top 10 peaks sufficiently large energy level and that beyond the ten peaks, the energy level decreases to 20% of that of the largest peak.Figure 2B shows exceptions when the 10th peak's energy level does not decrease to below 50% of that of the largest peak.The results of this paper support that the inclusion of the top 10 amplitudes and their frequencies in the FT-based features is successful to distinguish and learn hidden rules of the imminent Table 1.Algorithm-Fourier transform (FT)-based new feature generation.

[Step-I] Time-history of gauss curvatures of pseudo physics quantities
Loop: ∀ξ j ∈ V top f (1) www.nature.com/scientificreports/extreme EQs.Including more peaks (thus more energy) will be straightforward, and investigation into their impacts shall be a future research topic.

Data-driven prediction rules for individual large EQs
This paper pursues the purely data-driven prediction rules that are customized for individual large EQs, being independent of existing magnitude prediction models [22][23][24][25] , or earthquake forecasting methods [26][27][28][29][30][31][32][33] .Overall architecture of the adopted hidden rule-learning ML algorithm is illustrated in Fig. 1A(V).The generality of the hidden rule-learning approach shown in Fig. 1 has been demonstrated with complex physics phenomena at diverse scales from nano 34 , to micro 35 , to composite structures 36,37 , and to the Earth lithosphere 17 .Prediction rules are unraveled by the Glass-Box Rule-Learning algorithm that uses the multi-layered data (Fig. 1A(V)).The role of "Scientist in the Loop" is to monitor the rule-based predictions and help decide whether to recommend appending the identified best-so-far rule into the storage (i.e., global memory for future inheritance and predictions) or not.For instance, some predictions may have good fitness scores numerically, but their prediction plots may not satisfy the domain expert's knowledge.Then, the "Scientist in the Loop" may queue for additional rule-learning by changing ML-control parameters or expanding search spaces.Since each large EQ has its own prediction rule in the proposed approach, such a re-learning can be done separately and multiple times, specifically for the EQ.This scientist's role augments the data-driven rule-learning process to better comply with domain science.
In the previous work of the author 17 , the best-so-far prediction rule was identified by a multiplicative combination of cubic regression spline (CRS)-based LFs of (i) the pseudo released energy (its LF is denoted L E ), (ii) the pseudo power ( L P ), (iii) the pseudo vorticity ( L ω ), and (iv) the pseudo Laplacian ( L L ).The CRS-based LF can leverage its high flexibility 38,39 , and its general form is given in "Methods".CRS-based LFs can embrace constant shift, linear, and nonlinear curves 38,39 .Thus, the best-so-far data-driven prediction rule without the Fourier transform-based features (denoted is the best-so-far pseudo released energy at epoch t and at the reference volume ξ j .The free param- eters associated with the best-so-far CRS LFs L E , L P , L ω , and L L are denoted by θ E , θ P , θ ω , and θ L , respectively.Sg(.) stands for a typical sigmoid function.The detailed rationales for the data-driven rules are presented in Ref. 17 .
In this study, the Fourier bases are used to sharpen the prediction rule.The top 10 frequencies of the principal Gauss curvatures are used in the Fourier bases.The best-so-far data-driven prediction rules with the Fourier transform-based features in conjunction with CRS LFs is denoted by where L E , L P , L ω , and L L are from Eq. ( 4); the Fourier-based link functions are give as Similarly, all other Fourier-based LFs are given in Supplemental Material.Here, the sorted frequency f top .In particular, the Fourier frequencies of (κ 1 , κ 2 ) E are used to augment L E as shown in the first line of Eq. ( 5).The FT-based LFs appear to offer considerable flexibility to the prior best-so-far prediction rule (Eq.4) and thus improve accuracy of the prediction rules.
It should be noted that all the LFs are customized for individual large EQs in this paper.Rather than a single set of LFs describing all EQs, each large EQ will have its own best-so-far LFs (i.e., specialized prediction rules).This customized approach will be helpful for a future expansion to a reinforcement learning-based evolution of this framework in which unsupervised ML methods can continue learning and improving the best prediction rules (i.e., LFs) for many different large EQs, without human interventions.
The ML-identified rule uses the observed 10-year data, 30 days before the event without any physics mechanisms or statistical laws.The best-so-far rule appears to be successful in reproducing the next-month earthquake's location and magnitude 30 days before the event as shown in Fig. 3.The ML-identified rule appears to reproduce the global peak event noticeably well.In addition to Fig. 3, all other ML-driven reproductions of 8 large events of magnitudes ( M w ≥ 7.0 ) in the West U.S. region are shown in Figs.S2 and S3.Also, Figs.S4-S6 presents ML-driven reproductions of 9 large events of magnitudes ( M w ∈ [6.5, 7.0) ) in the western U.S. region.( 4) × L P × (L FT (κ 1 (P) ) + L FT (κ 2 (P) )) i κ (E) )) In some cases, the ML-identified rules reproduce reasonably the global peak's location and magnitude with a few false small peaks (e.g., Figs.S2A-D).Such false peak reproductions appear more noticeable in events of magnitudes ( M w ∈ [6.5, 7.0) ) than events of magnitudes ( M w ≥ 7.0 ).For instance, Figs.S5C-F appear to show the wrong reproductions of false peaks.It is related to the limit of the best-so-far ML-identified rules which shall be improved in the future extension.Still, the overall performance is promising since the largest peaks' locations and magnitudes are reasonably reproduced by the customized data-driven model.Table 2 summarizes the prediction results of individual 8 large earthquakes of magnitude ( M w ≥ 7.0 ) and 9 large earth- quakes of ( M w ∈ [6.5, 7.0) ) using the best-so-far data-driven prediction model.For the 8 large earthquakes of magnitude ( M w ≥ 7.0 ), the mean differences in (latitude, longitude, depth, magnitude) between real peak and ML-reproduced peak are ( 0.12 • , 0.15 • , 4.21 km, 0.18 ).The mean differences increase to ( 0.28 • , 0.51 • , 5.4 km, 0.22 ) for the 9 large earthquakes of magnitude ( M w ∈ [6.5, 7.0) ).These difference underpins the overall accuracy of the best-so-far ML-identified rules in reproducing three-dimensional locations and magnitudes 30 days before the event.But, it also implies that the rule's accuracy appears to deteriorate for the second largest group of ( M w ∈ [6.5, 7.0) ).Uncertainty also increases for this second largest group.For the 8 large earthquakes of magni- tude ( M w ≥ 7.0 ), the standard deviation of the differences in (latitude, longitude, depth, magnitude) between real peak and ML-reproduced peak are ( 0.1 • , 0.17 • , 3.19 km, 0.12 ).The standard deviation of differences increase to ( 0.22 • , 0.52 • , 4.52 km, 0.15 ) for the 9 large earthquakes of magnitude ( M w ∈ [6.5, 7.0) ).Improvement of accuracy and underlying uncertainty shall be a natural future extension topic.The preserved interpretability is noteworthy.The best-so-far prediction rules are remembered by storing all the free parameters of the LFs.By retrieving the parameters and plugging them into the corresponding LFs' expressions (e.g., Eqs.6 or 11), one can investigate and interpret individual physical terms and their behavior (e.g., Fig. S8 of the author's prior work 17 ).During the rule-learning process, the poor-performing combinations of features and their LFs are rejected.By doing so, this approach can help improve physical interpretation of the ML-identified rules.In particular, the best-performing prediction rule (Eq.5) turns out to select the pseudo vorticity's first term ω and the pseudo Laplacian's first term ∂ 2 E * (t) r ∂ 2 out of many other feature terms (e.g., Table 2. Individual large earthquake reproductions using the best-so-far customized data-driven models with FT-based new features.8 events of M w ≥ 7.0 and 9 events of M w ∈ [6.5, 7.0).�Mw and ξ are the absolute differences in magnitude and location between the real observation and the predicted peaks, respectively.Some common names of EQs are included.The coordinates ( , φ, h ) of all the observed real peaks in this table are the same as those in the USGS catalog database.All other comparison plots between predicted and observed peaks of this paper present the center coordinates of the reference volume ( � = �φ = 0.1 • ; �h = 5 km) that contains the peak.
r , and so on).Physically, ω may describe the slow rotational motion about the longitudinal axis, and the directions of the western U.S. region's plate motions and the known major faults are roughly parallel or normal to the longitudinal axis.Therefore, any accurate data-driven prediction rules, if properly unraveled, should be able to highlight certain salient physical terms, and such favorable capabilities appear to be confirmed.Purely based on data, this paper's best prediction rules pinpoint salient feature terms, underpinning the preservation of physical intepretability.

Conclusion
The inclusion of FT-based features in the prediction rules appears to be effective to improve accuracy of large EQ's location and magnitude 30 days before the event.Figure 4 compares the positive impact of the FT-based features.Also, the FT-based features appear to help the ML-identified best-so-far rules to sharpen the predicted magnitude distributions and remove the incorrect peaks.For instance, these positive roles of FT-based features can be clearly seen from comparison between Fig. 4A,C and between Fig. 4D,F.
The improvement can be also confirmed by the sharpened distribution of the absolute errors between real and predicted magnitudes.For instance, Fig. S7 compares the absolute magnitude errors ( | M w | ) from predictions without and with FT-based features in the best-so-far rules.
To some extent, it is an anticipated result due to two reasons.FT-based features' many LFs ( L FT ) and the associated free parameters can offer additional fitting power to prediction like a deep learning model with more neurons.Also, many L FT 's can be regarded as many higher terms in the Fourier series (here, up to 10 harmonics) which can contribute to smooth fitting strength.
To incorporate the FT-based features, the ML-identified rules should have additional Fourier bases like Eq. ( 6).When prediction rules use only CRS-based LFs (Eq.4), the incorrect peaks and over-smoothing issues remain (see Fig. 4C,F).In contrast, the combination of smooth CRS-based LFs L i , i = (E, P, ω, L) and the Fourier series- based LFs L FT in Eq. ( 5) can offer enhanced accuracy of reproducing large rare peaks without incorrect peaks and over-smoothing issues (see Fig. 4A,D).The Gibbs phenomenon (i.e., over/undershoot issues near a jump discontinuity) appears not in effect at the present prediction rules using the Fourier series-based LFs.This may be attributed to the fact that the large EQs in this framework are regarded as "point sources" not lines, thus not necessarily leading to sudden discontinuities of target distributions.In the future extensions, when EQs are regarded as 2D line sources, the Gibbs phenomenon may negatively affect the predictions, which shall be addressed later.
Also, in the future investigations, comprehensive validations of the ML-identified prediction rules should be done to confirm general applicability to a wide range of EQ sizes.For instance, Fig. S8 shows preliminary test predictions of "quiet" period without large EQs ( M w < 5.5 during the period) by using the best-so-far rules.Fig. S8A shows the prediction 33 days before the Ridgecrest EQ (2019/7/6; M w = 7.1 ); Fig. S8B, 32 days before.No false alarms with spurious large EQ predictions are detected.Since the proposed approach trained and  5) whereas (C) and (F) utilize CRS-bases LFs of the four pseudo physics quantities by Eq. ( 4).unraveled all the prediction rules with large EQs data ( M w ≥ 6.5 ), this preliminary test result appears promis- ing.But, to draw a concrete conclusion about the proposed approach and also to be practically meaningful tool (like 28,32 ), future extension should conduct comprehensive tests over broad ranges of EQs.
The outcomes of this study add a new dimension to research for predicting individual large EQs.Gauss curvature-based unique signatures of large EQs may be remembered and distinguished by unsupervised ML methods while the data-driven prediction rules can be better customized for individual large EQs with new data.The overall processes can be managed and evolved by another global ML like reinforcement learning, thereby shedding light on purely data-and ML-driven predictions of large EQs.The handshake among ML methods, Fourier, and Gauss may help answer the long-standing enigma of seismogenesis.

Data preparation
This study collected and processed raw earthquake catalog data available in Ref. 18 from January 1980 through October 2019.Without any prejudice, all the recorded earthquakes within the past 40 years are included, and the total number of earthquakes is 1,895,190.According to the calendar-based date, all earthquakes within one day is stored in one data file.The day-based earthquake catalog data file is named as 1,000,000 for January 1st, 1980, 1,000,001 for January 2nd, 1980, and so on.Each file contains the number of data points in the file followed by longitude, latitude, depth, and magnitude of each earthquake.As illustrated in Fig. S1, one epoch is defined as 30-day time range.All earthquakes within the 30-days window are considered to belong to the same epoch.A frame of epochs consists of many consecutive epochs and serves as the training base for rule-learning glass-box machine learning algorithm.Within one frame of epochs, the last epoch is used as a target while all the previous epochs are used for training of hidden rules.As illustrated in Fig. S1A, the target epoch is completely disjointed from the frame of epochs used for training and rule-learning.As explained in Fig. S1, this study sets one-day interval between consecutive frames of epochs.By marching frames of epochs with one-day increment, this paper can dramatically increase the number of total frames of epochs to 14,600.For interested researchers, all the processed data sets of the refined epochs with one-day interval are publicly available at 40 .This paper focuses on prediction rule-learning about a large target EQ ( M w ≥ 6.5 ) positioned at the last day of the target of epoch (Fig. S1A).Thus, all ML-identified rules of this paper are specifically trained to predict a future large event 30 days before the target EQ (i.e., D-30 case in Fig. S1B).In the future extension, shorter timewindow predictions (e.g., a few days ahead) shall be possible by placing the target EQ at the earlier positions of the target epoch (D-2 or D-1 cases in Fig. S1B).In contrast, by defining wider target epochs, longer time-widow predictions (e.g., months or years ahead) shall be also possible, which will be meaningful for complementing the existing long-term EQ forecasting methods.

Data transformation from raw EQ catalog into spatio-temporal information index
Temporal convolution is carried out after spatial convolution is done as where the one-dimensional (1D) Gaussian weight ω(τ ; T l ) = (T l (2π) 1/2 ) −1 exp − τ 2 2T 2 l ; τ = |t − t past |, t ≥ t past , meaning the time gap between the current and the past time.And, the spatial convolution is done by where the 3D Gaussian weight ω(ξ j , x ; V means the entire lithosphere domain under consideration.Here, T l (l = 1, . . ., n T ) and L k (k = 1, . . ., n L ) are influence ranges in time and 3D space, respectively.Therefore, there can be at most n L × n T spatio-temporal IIs at one reference volume and a time.Following the preliminary investigations done in Ref. 17 , this study adopts n L = 2 with L 1 = 10 (km) and L 2 = 25 (km) while n T = 2 with T 1 = 3 (month) and T 2 = 6 (month).This combination appears to lead to the best-so-far prediction performance since it embraces dual impacts of close and far EQs both in three-dimensional spatial domain and in one-dimensional temporal domain.

Flexible and expressive link functions
Pursuing the interpretability, this paper adopts an expressive link function (LF) using transparent, flexible bases that can describe a mathematical expression between input features and the hidden rules as output.LF is denoted as L where θ is a set of free parameters prescribing the LF.The cubic spline regression (CRS) curves consist of a few cubic polynomials connected at knots so that the curves are continuous up to the second derivatives 38 .For practical cubic spline bases 39 (denoted as b i ), LFs look like

Figure 1 .
Figure 1.Overall architecture and conceptual illustrations of central steps of the proposed approach: (A) Multilayered data transformation from raw EQs catalog data to new features in terms of pseudo physics quantities, Gauss curvatures, and Fourier bases (I-IV).Transparent rule-learning machine learning (ML) method, denoted "glass-box" ML (V), to unravel prediction rules' expressions of individual large EQs; (B) 3D convolution for generating spatial information index; (C) Temporal convolution for spatio-temporal information index; (D) Fast Fourier transform (FT) to generate FT-based new feature that can quantify time-varying information about "fluctuating surfaces" of the pseudo physics and Gauss curvature-based features.

Figure 2 .
Figure 2. Normalized power spectra of the top 10 amplitudes and their frequencies after sorting the FFT results.(A) Rapidly decreasing relative energy levels of top 10 peaks from FFT of Gauss curvatures (denoted as K1 and K2).(B) Slowing decreasing relative energy levels.Er: The pseudo released energy, Vort: the pseudo vorticity, Lapl: the pseudo Laplacian, and Pwr: the pseudo power.