Real-time determination of earthquake focal mechanism via deep learning

An immediate report of the source focal mechanism with full automation after a destructive earthquake is crucial for timely characterizing the faulting geometry, evaluating the stress perturbation, and assessing the aftershock patterns. Advanced technologies such as Artificial Intelligence (AI) has been introduced to solve various problems in real-time seismology, but the real-time source focal mechanism is still a challenge. Here we propose a novel deep learning method namely Focal Mechanism Network (FMNet) to address this problem. The FMNet trained with 787,320 synthetic samples successfully estimates the focal mechanisms of four 2019 Ridgecrest earthquakes with magnitude larger than Mw 5.4. The network learns the global waveform characteristics from theoretical data, thereby allowing the extensive applications of the proposed method to regions of potential seismic hazards with or without historical earthquake data. After receiving data, the network takes less than two hundred milliseconds for predicting the source focal mechanism reliably on a single CPU.


3) Advantage of synthetic data
The argument that a model trained on synthetic data is better in 'scenarios without enough historical source focal mechanisms for training the neural network model, especially for those regions with limited seismicity but having the potential seismic hazards', only holds if the performance of models trained on real data generalizes poorly to regions outside of the training area. This does not seem to be the case: Hara et al. (2019) shows that a model trained to estimate P-wave first motions transfers well to other regions, even without finetuning. In general, CNNs built for picking tend to generalize very well, as the task is relatively simple.

4) Lack of test set on synthetic data
The performance of the model is only shown for testing and validating data (Figures 2 and 3 of the Supplementary). There does not seem to be any testing set on synthetic data. It would be useful to report the model's performance on a real test set instead.

5) Diversity in the examples of real earthquakes
Given that the four examples analyzed have nearly identical focal mechanisms, it is difficult to assess whether this approach would work well in general. Specifically, many damaging earthquakes occur in subduction areas where the mecanisms are not strike-slip as those analyzed here, and where seismic stations can be farther away from the epicenter (as many earthquakes occur offshore). It is unclear whether the approach would work in such cases.
2) Figure 2: When you say that the model 'output[s] the earthquake focal mechanism directly', it would be useful to show that this output corresponds to distributions of strike, dip, and rake. This figure is not very clear. This paper proposed an exciting approach using deep learning to determine the earthquake focal mechanism in near real-time. Using simulation generated synthetic waveforms as the training datasets for a fixed network in a region, the authors trained a CNN model to estimate the strike, dip and rake as a Gaussian distribution. The test results on the real Ridgecrest earthquakes looks great. Using the first part of the trained model (the compression part) as an encoder to compress the waveforms into sparse representation of the waveforms. The authors also tested the hypothesis the feature similarity in feature domain is equivalent to that in data domain. This opens the door for large database query using the compressed features. Overall, this paper is well written, and the proposed methods and tests are promising to be used in the future. The key point of this paper is to use the synthetics as the training data, in a sense, this encoded the geophysics knowledge into the deep learning model and then turn the geophysics problem into a pattern matching problem. By doing this, it can determine the focal mechanism faster without human intervention. It provides a nice proof of concept and applies to the real cases successfully. I only have some minor comments, and I would recommend a publication of this paper after minor revision. It will be a good contribution to the community. Qingkai Kong -Berkeley Seismology Lab.

Comment 1:
There are also some disadvantages of using this approach that may need to expand a little bit in the paper, such as the network is fixed (not so flexible), what is the effect of the velocity model (if no available good velocity model exists in a region), and only usable for larger earthquakes (because the frequency band used). I hope the authors can expand some discussion in the paper to make this clear or provide some walk around. Therefore, I suggest the authors can do some quick tests to show the stability of the method if some stations are missing (make the approach a little flexible), such as change the number of stations, for example, if there are some stations have problems during normal operation, that the recordings are not reliable, how do the results change (assuming one or two stations have no data, replace the waveforms into zeros, etc.)? Generate some examples using different velocity model, and monitor the changes of the errors (also a better way to quantify the mis-match of the FM).
Authors: Thank you very much for your helpful comments and suggestions.
Following the comments and suggestions, we have added several tests to report the performance of our model in the revised manuscript: 1. In the first test, we generate a new test dataset of 1,000 unseen synthetic samples with a diversity of focal mechanisms ( Supplementary Fig. S4) to test and report the model performance. This new test dataset is generated at a variety of random locations within the study area. We also add realistic noises from real recordings and picking errors into this new test data. After prediction on this new test dataset, we define a successful prediction only if the maximum values of the predicted Gaussian probability is larger than 0.7 for each test sample. With this threshold, 91.04% of the test samples can be successfully recalled. Also, we adopt the Kagan angle analysis (Kagan 1997;2001), in which each Kagan angle quantitatively characterizes the difference in rotation angle between the true and the predicted focal mechanisms, to evaluate the estimation errors. The Kagan angle distribution results ( Supplementary Fig. S5) S13). Then we predict the focal mechanism using our network model. From the test results ( Supplementary Fig. S14), we find our model can produce Gaussian probability distributions similar to the true distributions. This indicates that missing data at a few stations does not affect the prediction results very much.
And we report this test in the Discussion section: "To further verify our model on the cases with outliers, we test the scenario that some of the recording stations have data issues and waveforms are missing, but the azimuthal coverage is still good (Supplementary Fig. S13). We find that the predicted probability distributions can match well with the true distribution in terms of their shape and maximum values when partial data are missing ( Supplementary Fig. S14)." 3. In the third test, we generate a new test example using a different velocity model ( Supplementary Fig. S7). We perturb the true velocity model by a maximum of 10 percent in each layer to generate the perturbed velocity model. From the prediction results ( Supplementary Fig. S8), we find that the inaccurate velocity model will increase the estimation errors. We think this is because we train the neural network associated with a particular velocity model and it is sort of model-dependent. And we report this test in the Discussion section: "In the Supplementary materials, we present a numerical study using a velocity model with perturbations ( Supplementary Fig. S7) 1. In the first test, we halve the available stations and put them on one side of the event ( Supplementary Fig. S11). From the test results ( Supplementary Fig. S12), we find that the predicted probability distributions differ from the true labels in terms of both shape and the maximum values. Two secondary local peaks in strike appear and the maximum values are lower. This test shows that a poor azimuthal coverage of stations will increase the estimation errors compared to a good azimuth coverage. This is because the azimuthal coverage, which provides the constraints for the source radiation pattern of the focal sphere, definitely affects the constraints to the focal mechanism. And we report this test in the  Supplementary Fig. S11). The event is assumed to occur in an area with training data available. From the test results, we find that both strike and dip are well resolved, but the rake angle is off by nearly 30˚, and the prediction probability of rake is significantly lower (about 0.5) (Supplementary Fig. S12). Therefore, it is important to evaluate the prediction probabilities." 2. In the second test, we design an event that occurs out of the study area ( Supplementary Fig. S15). From the test result ( Supplementary Fig. S16), we find that the predicted Gaussian probability distributions tend to have smaller maximum values (about 0.6) than the true distributions. Although this test shows only one example, we can use the predicted maximum probability to evaluate the reliability of the predicted results. And we report this test in the Discussion section: "We also test a case where an event occurs out of the study area ( Supplementary Fig. S15). The test results show that the predicted probability is much smaller (about 0.6), which can help quantify the reliability of the predicted results ( Supplementary Fig. S16)." We gratefully thank you for all these test suggestions. They greatly help improve our manuscript!

Comment 3:
Maybe this is some future work, in Ridgecrest, there are many smaller earthquakes that have focal mechanisms till now (hundreds of M3.5+), in this paper, the authors only tested the 4 large events. But I think it is worth testing all these smaller events as well, and see the limit of the model at different frequency bands.
Response: Thank you very much for this insightful comment. Yes, we agree that the extension to even smaller earthquakes (M3.5+) would be very meaningful and useful and we will consider this in our future study. We have tried to process the real waveforms (alignment, bandpass filtering, and normalization) on several smaller earthquakes (M4). But we find that the waveforms of real data are mainly dominated by noise within the selected frequency band and therefore the results are not promising. Increasing the frequency band would require a finer 3D velocity model. In our future effort, we will work on smaller earthquakes (M3.5+) with a high-resolution 3D velocity model and an efficient waveform modeling tool. Following your suggestions, we add more illustration about this in the Discussion section: "Since we use synthetics associated with a 1-D velocity model to create a dataset for training and testing, it limits the application to low-frequency data, which are generally available from moderate and large events." And, "Further development efforts are needed to combine the P-wave first motions and waveform data to handle smaller events. Generating a 3-D velocity model with great details could help model the high-frequency data as well." We shall enroll these efforts into our next research plans and we gratefully thank you again for this very insightful comment!

Comment 4:
When you generated the synthetics for training purposes, did you use a range of magnitude events on different grid? I cannot find this information in the paper, please specify so that the readers can see what you did.

Authors:
We do not consider the magnitude information, which is eliminated in the normalization step for each data sample. To specify this information, we have added texts in the Result section: "Since we normalize the waveforms of each synthetic sample based on the maximum amplitude, we choose a fixed magnitude for all events when modeling the synthetic waveforms."

Comment 5:
In the paper, line 279, it is saying "the FMNet does not require the pre-knowledge of the location or depth of a real earthquake as long as it is within the monitoring area".
But during training and testing, you did align these waveforms based on the theoretical P, therefore, I think this statement is not accurate. In real application, how do you align these waveforms? If you are using the theoretical P, then you do need the location information. I guess if you use the trigger onset instead of theoretical P, because when you form the matrix of the input data, these stations are in order, this automatically encode some information about the travel time for the later phases. But please make this clear.
Authors: Yes, we use the picked onset of P-waves to align the waveforms. To make it clear, we have added this information in the Result section: "For real data, we need take the picked onset time of each trace for carrying out the waveform alignment."

Comment 6:
In preparing the training data, how did you add the realistic noise? Please make it clearer in the methods section. And in the paper, it is said adding a random 10s shift error, on all the waveforms? or something else?
Authors: To clarify, we have added the following information in the Result section: "The realistic noises are extracted from the real recordings at each seismic station.
The random time shifts are added to each trace of the training samples to account for the picking errors."

Comment 7:
Please report the training time on this particular training and the specification of the GPU (if used), usually this information will be interested in the community. We gratefully thank the very helpful comments and suggestions from reviewer #2.
Following the comments and suggestions, we have made substantial efforts to address all the comments by adding more test results, figures, and descriptions. These comments and suggestions greatly help improve our manuscript. Also, we have prepared point-by-point responses. (-From authors).

Reviewer #2 (Remarks to the Author):
The authors propose a deep learning approach for earthquake focal mechanism determination. Estimating the focal mechanism of an earthquake is of interest in order to understand its physical characteristics, in particular regarding local stress redistribution and future aftershock locations. P-wave first motion estimates by deep learning are likely to be extremely fast.
Furthermore, their major advantage is that these detections are not region dependent: the same algorithm can be applied anywhere, as it is usually based on a single-station analysis. These approaches have also been found to work well on smaller events.
Therefore they appear more simple and suitable for the determination of focal mechanisms than the methodology proposed here by the authors. Given that i) the We also conducted more tests and would like to further address some of the concerns. Using the P-wave first motions to invert for the source focal mechanism is a classical approach, which includes two steps: 1) the first motion estimation; 2) and the focal mechanism inversion using the estimated first motions. The first step (first motion estimation) has been greatly improved by the recent deep learning approaches We revised the Introduction as follows: "Several recent studies first apply deep learning to estimate the P-wave first-motion polarities 9,[39][40][41] , and then apply the first motions to carry out focal mechanism inversion using programs such as HASH 42 . One of such efforts leads to improved focal mechanisms in California compared to existing catalogs 9 . Several seismological studies also suggest that utilizing waveform data can provide better constraints for deriving the focal mechanism than using the P-wave first-motion polarities 37,38,43 . Our objective is to develop a seamless real-time solution for obtaining the focal mechanism in an automated fashion. Directly obtaining the focal mechanism of an event from waveform data with processing effort as little as possible is more appealing." We revised the Discussion as follows: "Different from the approach using the P-wave first motions which requires sufficient azimuthal station coverage ( Supplementary Fig. S9 and Fig. S10 [4][5][6][7] . " In addition to the differences in data contribution, please also note that our objective in this study is to develop a seamless real-time method for obtaining the source focal mechanism in an automated fashion. Conducting numerical inversions often requires fine-tuning parameters and quality control, which may be challenging in real-time. On the other hand, the FMNet approach seems to involve more efforts in the training data preparation and testing phase, but straight-forward when dealing with a new entry event.
Hope our revisions, new comparison test, and explanations could help clarify this concern.
Comment 2: Timeliness of estimates The authors report the computation time of the focal mechanism estimates (about 200 milliseconds). This is not what matters for applications in early warning. Indeed, estimating an earthquake's focal mechanism will require that i) the waves reach the seismic stations, and ii) that the data is processed. Therefore in real scenarios, the timing to get a focal mechanism estimation after the occurrence of an earthquake will be much larger, likely of the order of several tens of seconds. This is not analyzed at all in the paper. In addition, generating synthetics using a velocity model is not a complicated effort. For different regions and recording networks, the approach can be repeated easily using the same programs and once the data preparation is completed, the process is straight-forward for processing any new event.  Fig. S5) show that our model can successfully predict a diversity of focal mechanisms with reasonable estimation errors (mostly <10˚ and maximum of 25˚). This new test validates the generalization ability of our model on predicting a diversity of focal mechanisms and also quantitatively shows the estimation errors.
2. In the second test, we assume that two stations have recording problems and their waveform signals are missing (zero amplitudes as shown in Supplementary Fig.   S13). Then we predict the focal mechanism using our network model. From the test results ( Supplementary Fig. S14), we find our model can produce Gaussian probability distributions similar to the true distributions. This indicates that missing data at a few stations does not affect the prediction results very much.
And we report this test in the Discussion section: "To further verify our model on the cases with outliers, we test the scenario that some of the recording stations have data issues and waveforms are missing, but the azimuthal coverage is still good (Supplementary Fig. S13). We find that the predicted probability distributions can match well with the true distribution in terms of their shape and maximum values when partial data are missing ( Supplementary Fig. S14)." 3. In the third test, we design an event that occurs out of the study area ( Supplementary Fig. S15). From the test result ( Supplementary Fig. S16), we find that the predicted Gaussian probability distributions tend to have smaller maximum values (about 0.6) than the true distributions. Although this test shows only one example, we can use the predicted maximum probability to evaluate the reliability of the predicted results. And we report this test in the Discussion section: "We also test a case where an event occurs out of the study area ( Supplementary Fig. S15). The test results show that the predicted probability is much smaller (about 0.6), which can help quantify the reliability of the predicted results ( Supplementary Fig. S16)." Thanks to the authors addressed my comments and added more tests to improve the paper.
Overall, the authors answered all my comments with more tests and discussions in the paper, I only have a few follow-up comments based on this and hope the authors can answer and test.
* Regarding the answers to my comment 1, since the authors have already done the tests with dropping stations, are these dropped stations randomly selected? If yes, please specify in the paper. * Also, the authors showed that when perturbing the velocity model, or the out of network events test, the performance of the model does degrade, it is better to clearly specify these limitations in the discussion instead of just list the results, unless this can be addressed. * Based on the answer to my comment 4, the authors used fixed magnitude for generating the training samples, which may introduce problems. Though the authors normalized the waveform, which reduced the effect of amplitude, there are more factors that change when the magnitude various, such as duration of the waveform, SNR, for example. For a waveform based method instead of only using the first motion polarity, I think this will have an effect, it is better to study this well. Especially the real test results only show on a few big events, it is hard to evaluate these aspects. My concern is that the trained model only tuned to estimate the results well on a very limited range of events, but in reality, you do have various cases of magnitude events that make the simulation more complicated. * Regarding the application of the model, I do agree with the other reviewer that the model currently still limited, i.e. depend on the region, network, and only works on large earthquakes. Hope the authors can continue to make improvement of the model in the future.
Reviewer #2 (Remarks to the Author): Many thanks to the authors for their comments, and detailed additions that helped a lot to improve and clarify the manuscript. The modifications are very precise and well explained, with several additional paragraphs and figures in Supplementary to illustrate and quantify the tests conducted. However I have some concerns regarding the results in testing. The previous version of the manuscript did not include any measure of model perfomance, and I am puzzled by those now added in the test analysis.
In what follows, comments from the authors are in brackets, and responses are not.
Comment 1: Comparison with P wave first motion estimates. « We revised the Introduction as follows [...] » Thank you very much for the added litterature references, which are useful to put the proposed methodology in perspective compared to other existing methods. The paragraph added to the introduction conducts a detailed analysis of the litterature relative to automated P wave first motion estimates, and the description of waveform-based versus first motion estimates of focal mechanism is also very helpful in the context of this paper. « We revised the Discussion as follows […] » Many thanks for the added paragraphs.
«We have also designed a new test to compare the performance between the P-wave first motion method and the FMNet method.» Many thanks for the additional tests which are of great interest in order to emphasize the strength of the proposed approach. Indeed the algorithm presented here seems to outperform first arrivals methods (with lower estimation errors), and to perform better when the number of stations is low. Something I'm curious about is the relative computation time of both approaches. Even in the abscence of finetuning, the proposed methodology is likely to be much faster -do you have an overall idea of the speed difference? Is the finetuning a time-consuming exercise? Overall I find these tests very convincing to highlight the advantages of the algorithm as a real-time estimator of earthquake focal mechanism.  Figure S5. In particular, while Figure S5 is cut at 25 degrees, there appears to be heavy tails in the distribution (Figure attached). More than 10% of the estimates have an error larger than 20 degrees, which seems high for a model trained and tested on synthetic data. The fraction of errors above 50 or 60 degrees is also far from negligeable. Therefore while the model is probably faster than existing methods, I'm not sure that one could argue that it outperfoms them in terms of precision. Looking at a few individual examples, it is likely that a symmetry issue is impacting the estimations of the neural network. We gratefully thank the comments and suggestions from reviewer #1. Following the comments and suggestions, we have made revisions to address all the comments. Also,

we have prepared point-by-point responses. (-From authors)
Reviewer #1 (Remarks to the Author): Thanks to the authors addressed my comments and added more tests to improve the paper.
Overall, the authors answered all my comments with more tests and discussions in the paper, I only have a few follow-up comments based on this and hope the authors can answer and test.

Comment 1:
Regarding the answers to my comment 1, since the authors have already done the tests with dropping stations, are these dropped stations randomly selected? If yes, please specify in the paper.
Authors: Yes, these dropped stations are randomly selected. We have added this information in the Supplementary materials Section 8. "In such a case, we randomly select two recording stations and replace the waveforms with zeroes".

Comment 2:
Also, the authors showed that when perturbing the velocity model, or the out of network events test, the performance of the model does degrade, it is better to clearly specify these limitations in the discussion instead of just list the results, unless this can be addressed.
Authors: Thank you very much for your comment. We have revised the Discussion to specify these limitations. "From these test results, we find that inaccurate velocity model, poor azimuthal coverage, or events out of the network might degrade the prediction performance with low probability. Therefore, using the predicted probability to quantify the reliability of the predicted result is essential."

Comment 3:
Based on the answer to my comment 4, the authors used fixed magnitude for generating the training samples, which may introduce problems. Though the authors normalized the waveform, which reduced the effect of amplitude, there are more factors that change when the magnitude various, such as duration of the waveform, SNR, for example. For a waveform based method instead of only using the first motion polarity, I think this will have an effect, it is better to study this well. Especially the real test results only show on a few big events, it is hard to evaluate these aspects. My concern is that the trained model only tuned to estimate the results well on a very limited range of events, but in reality, you do have various cases of magnitude events that make the simulation more complicated.
Authors: Thank you very much for your comment. We agree that earthquakes with different magnitudes will have different source durations of waveforms and might present different SNRs. When preparing the training data, we have considered these two factors to mitigate their effect. "The current FMNet is designed for monitoring local or regional events within the coverage of a seismic network. Similar to the state-of-the-art methodology for resolving source focal mechanisms by applying moment tensor inversion, the FMNet is limited to moderate and large earthquakes that can be numerically modeled. Developing the capability to simulate waveforms of small earthquakes in high frequency warrants further study." We gratefully thank the very helpful comments and suggestions from reviewer #2.
Following the comments and suggestions, we have made substantial efforts to address all the comments. These comments greatly help improve our manuscript. Also, we have In what follows, comments from the authors are in brackets, and responses are not. 'it is challenging to model the high-frequency theoretical waveforms with a simple 1D velocity model'). Adding a small caveout on the use of synthetic data for training the model would be useful.

Authors:
We agree with you on the limitations here for small earthquakes and we have specified these limitations in the manuscript. This is also the limitation in all of the existing methods when adopting waveform matching with a simplified 1D velocity model for source focal mechanism inversions in the current seismology. We will need to further develop the modeling capability for small earthquakes. Further modeling the high-frequency theoretical waveforms will require an accurate 3D velocity model and an efficient modeling tool with tremendous computational efforts (such as Wang and Zhan, 2020). To ease this concern, we specify the current limitations of the model in the Second, we agree that the use of the recall score for this regression problem may not be appropriate. Therefore, following your suggestion, we have omitted the use of the recall score in the revised manuscript.
No evaluation of the model was provided in the original manuscript. Since the evaluation of the model presented in the new Supplementary paragraphs was strange, I took a look at the code and re-ran it on the test set provided. Computing the Kagan angles on the test set led to quite different results than those provided in Figure S5. In particular, while Figure S5 is cut at 25 degrees, there appears to be heavy tails in the distribution (Figure attached). More than 10% of the estimates have an error larger than 20 degrees, which seems high for a model trained and tested on synthetic data. The fraction of errors above 50 or 60 degrees is also far from negligeable. Therefore while the model is probably faster than existing methods, I'm not sure that one could argue that it outperfoms them in terms of precision. Looking at a few individual examples, it is likely that a symmetry issue is impacting the estimations of the neural network.

Figure provided by Reviewer #2
Authors: Thank you very much for testing our codes. Following your comments, we have carefully investigated this test and we would like to clarify in the following: Please kindly follow the detailed steps specified in the "README" file when you run the codes.

Figure R2. Test performance of the improved FMNet model
3.
Since the improved FMNet model shows improved test performance, to ensure the consistency of our manuscript, we have also taken this opportunity to re-examine this improved FMNet model on both the real data application and other tests. Using the improved FMNet model, we redo both the real data application and other tests.  Fig. S4 i. "From the testing results ( Supplementary Fig. S8), we can tell that the estimation errors for dip and rake are about 12˚ 8˚ and 30˚20˚, respectively." ii. "From the test results, we find that both the strike and dip angle are well resolved, but the rake angles is off by nearly 30˚20˚, and the prediction probability of rake is significantly lower (about 0.5) ( Supplementary   Fig. S12)." To briefly summarize, following your comments, we have corrected the mistake in the Kagan angle calculation and we have improved the FMNet model. Using the improved FMNet model, we also re-examine the real data application and other tests to ensure the consistency of our manuscript. At last, we want to gratefully thank all your comments. These comments indeed greatly help improve the strength and completeness of our manuscript. Also, we hope our responses and revisions can address your comments and concerns.

Authors:
We are glad that you are happy with our previous revisions.