Introduction

Earthquake monitoring is one of the most fundamental operations in seismology. A standard earthquake monitoring workflow involves a series of steps to detect and characterize earthquakes, including phase picking, association, and event location1,2,3. Phase picking, a conceptually simple task that is akin to detection problems in computer vision, has recently been improved through deep learning3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, where convolutional neural networks (CNNs)19 are typically used. After the phase picking, traditional20,21,22,23 and deep-learning-based24,25,26 phase association algorithms have been used to link seismic phases at multiple stations from the same events. Finally, location algorithms27 utilize the associated phases to obtain the earthquake hypocenters, although some deep-learning-based methods directly process raw data to locate earthquakes28,29,30,31,32,33,34.

These three tasks (phase picking, association, and location) are closely interdependent. The accuracy of multi-station phase picking affects the accuracy of association and location. Conversely, association and location impose constraints on multi-station phase picking. Additionally, phase picking with multi-station data can further utilize the geographic relationships and waveform similarities among multiple stations. To achieve more efficient and accurate earthquake monitoring, a suitable earthquake monitoring workflow should impose inter-task and inter-station constraints and preferrably perform all three tasks simultaneously at all stations. However, most existing earthquake monitoring methods perform phase picking, association, and earthquake location separately. In addition, most of the current phase-picking methods process seismic data on a station-by-station basis. While some recent graph-based approaches35,36,37,38,39 have demonstrated the ability to handle irregularly spaced stations for phase association and event location, it remains a challenging task to develop a method that effectively leverages inter-task and inter-station constraints, and ideally performs all three tasks simultaneously.

Here, we propose an all-in-one earthquake monitoring system called seismic Phase picking, Location, and Association Network (PLAN) that achieves for the first time the simultaneous implementation of the three tasks with multi-station data and inter-task constraints. PLAN consists of four interdependent neural network modules. Specifically, the first module of waveform feature extraction utilizes an encorder-decoder architecture to extract relevant features from multi-station seismic data. The second module of earthquake location encodes station locations (i.e., longitude, latitude, and elevation) and merges them with waveform features from the first module to predict the earthquake depth and epicentral distance for each station. The third module of phase association utilizes the predicted earthquake location information to estimate the time shifts required to align multi-station waveform features. Finally, the fourth module of phase picking aggregates the aligned features for simultaneous multi-station phase picking. We applied PLAN in the Ridgecrest and Japan regions and compared its efficiency and accuracy with that of state-of-the-art phase-picking and event location methods, demonstrating the merits of inter-station and inter-task constraints for accurate earthquake monitoring.

Results

Multi-station multi-task PLAN

The proposed multi-station multi-task PLAN (Fig. 1) employs a Graph neural network (GNN)40 as the backbone to integrate the four functional modules of waveform feature extraction, earthquake location, multi-station association, and a physics-informed multi-station phase picking (further details are provided in the Methods section). Compared with CNN, GNN is naturally suited for handling seismic data acquired from irregularly spaced stations38.

Fig. 1: The flowchart of the proposed multi-task and multi-station PLAN for earthquake monitoring.
figure 1

The input data for the model comprise seismic waveforms recorded by multiple stations and the locations of these stations. PLAN consists of four sub-modules: waveform feature extraction encoder and decoder, an earthquake location module, a multi-station association module and a physics-informed multi-station phase-picking module. All of these sub-modules are optimized simultaneously and constrained by each other during training to improve performance in earthquake detection, association, and location.

For the GNN in PLAN, the graph nodes and the feature vectors are represented by the seismic stations and the corresponding information (i.e., locations and seismograms), respectively. All nodes are linked together and the linking weights are learnt during training to infer the relationships among the stations. We construct the GNN layers with TransformGConvs41, which are designed based on an attention mechanism42 to learn the dynamically linking weights among different stations. (Details about TransformGConvs are provided in the “Methods” section). In addition, the graph nodes are not fixed so that the GNN could be adapted to variations in the station number and location.

Raw three-component seismic signals are feature vectors of the graph nodes. The front-end waveform feature extraction module, constructed as an encoder-decoder CNN and shared among the nodes, extracts their corresponding key features. The station feature extraction block, constructed as two MLPs and shared among the nodes, extract geographic features from the normalized input longitudes, latitudes, and elevations of the stations. The earthquake location module then concatenates the extracted waveform and geographic features and employs multiple TransformGConvs to aggregate these features from multiple nodes to predict the event depth and station-event offset. The predicted offsets and depth are further used to determine the event location by triangulation43. Instead of predicting the hypocentral location, we predict the station-event offsets and the depth, and feed them into the followed multi-station association module to estimate to estimate the time shifts needed to align the P-wave and S-wave arrivals.

The multi-station association module plays a key role in bridging the tasks of earthquake location and multi-station phase picking and introduces physical constraints between the two tasks. Prior to aggregating the waveform features from different stations for multi-station phase picking, the features corresponding to the same earthquake are required to be initially aligned or associated; otherwise, aggregation of unaligned features could mutually interfere and ultimately degrade the picking performance. The multi-station phase-picking module includes a non-trainable physical layer, implemented with the Pytorch44 roll function, to shift and align the waveform features (from the decoder of the waveform feature extraction module) using the time shifts. Subsequently, multiple TransformGConvs in the phase-picking module aggregate the aligned waveform features to enhance the phase-picking features in the aligned space. Eventually, another physical layer unshifts the aggregated features back to the original space, followed by two convolutional layers to obtain the P/S-wave picks at all the stations.

Three regression loss functions are defined for the three modules corresponding three tasks of phase picking, association, and earthquake location and then combined to jointly train the entire network. Because all the modules are interconnected within the entire network, the training process finds an optimal network that could perform all the tasks both accurately and consistently. Moreover, after training, the multi-station association module could be detached from the network and utilized to calculate the S-P differential travel time with inputs: offsets and event depth. Further details on this module are provided in the section titled “Multi-station association module”.

Data preparation

We tested the proposed PLAN in two regions of Ridgecrest and Japan. For the Ridgecrest region (Fig. 2a), seismic recordings from 16 California Integrated Seismic Network stations within an epicentral distance of <80 km were collected from 1 January 2014, to 31 December 2021, for a total of more than 71,000 M > −0.5 earthquakes. The data for Japan (Fig. 3a) included M > 2 earthquakes that occurred between January 1, 2011, and December 31, 2011, including the Mw 9.1 Tohoku sequence. We collected the 3-component High Sensitivity Seismograph Network (NIED Hi-net)45,46 data from over 35,000 events. Subsequently, the data were randomly divided into training, validation, and test sets (85%, 5%, and 10%, respectively) in both regions.

Fig. 2: Distributions of phase picking and location residuals in Ridgecrest region.
figure 2

a distribution of 16 stations (black triangles) and event locations of the test dataset (blue circles) used in our study. The red points represent the earthquake locations predicted by PLAN. b and c are the results of P-wave and S-wave arrival time residual, respectively. The blue, green, and orange lines in b and c represent the arrival time residuals for PLAN, PhaseNet, and EQTransform, respectively. The proposed method yields the most accurate results in P/S-wave picking. d and e represent the offset and depth residuals between model predictions and Southern California Seismic Network (SCSN) catalog of the located events. Regardless of the offset or depth, the residual distribution of PLAN (blue line) is more concentrated at zero than that of Aggregated-GNN (orange line).

Fig. 3: Distributions of phase picking and location residuals in Japan.
figure 3

a distribution of stations (black triangles) and events of the test data (blue points) used in our study, where the events occurred between 1 January 2011 and 31 December 2011. The red points represent the earthquake locations predicted by PLAN. Similar to Fig. 2b, c shows the results of P-wave and S-wave arrival time residuals, respectively. d and e depict the offset and depth residuals between model predictions and the Japan Meteorological Agency (JMA) catalog of located events, respectively.

The number of stations corresponding to each event in the training samples varied, and the trained network can flexibly handle situations where the number of stations changes in actual data. Further, the distributions of the number of stations per event in the training and test sets were balanced. The results for the test sets in the two regions are presented in Figs. 4 and 5, respectively. To accommodate different range scales in the two study regions, we used different window lengths in two regions (30.72 s for Ridgecrest and 61.44 s for Japan) with the same sampling frequency (100 Hz).

Fig. 4: Comparison of prediction results using different numbers of stations in the Ridgecrest region.
figure 4

The colorful curves in ad represent the distributions of prediction errors for P-wave, S-wave, offset, and depth, respectively. The x-axis represents the number of stations, the primary y-axis denotes the prediction errors for phase picking and event localization of stations, and the secondary y-axis represents the number of events recorded by a specific number. Note that for phase picking, prediction residuals of PLAN (blue curves) decrease evidently as the number of stations increases. Moreover, the location errors of PLAN are significantly smaller than those of the Aggregated-GNN method.

To ensure a fair comparison of PLAN with the existing phase-picking methods, we followed the same data preprocessing procedures used in previous studies6,12:(1) normalizing the data by removing the mean and dividing by the standard deviation;(2) using a Gaussian-shaped target function as training labels for the P/S-phase arrival times. Thus, the probability vector of P/S wave is the sum of a zero vector and a Gaussian window (0.4 s), with the center of the window fixed at the P/S wave arrival time.

Application to seismicity in Ridgecrest region

We compared the performance of PLAN with that of other established deep learning methods for phase picking (PhaseNet6 and EQTransformer12) and location (Aggreated-GNN36). All of the methods were retrained on the same training set and evaluated on a common test set. As shown in the Ridgecrest application (Fig. 2b, c), the performance of PLAN in phase picking was superior to that of the other two deep learning-based methods. Specifically, the residual distribution of the P-wave picks for PLAN was more concentrated than that of the other methods, indicating a higher overall accuracy. For S-wave picks, PLAN performed significantly better than EQTransformer because the distribution of PLAN was narrower whereas the difference in performance between PLAN and PhaseNet was relatively minor.

In terms of localization, our method (PLAN) outperformed Aggregated-GNN (Fig. 2d–e and Table 1). The distribution of PLAN was notably more concentrated than that of Aggregated-GNN, particularly in terms of offset prediction. To further demonstrate the effectiveness of TransformGConv, we replaced all the TransformGConv layers in Supplementary Fig. 1 with GCN40, SAGE47, and GATv248, respectively. Among the various methods compared, PLAN yielded the lowest offset residual, with an average error of 1.09 km and a standard deviation of 1.41 km. Furthermore, PLAN also outperformed Aggregated-GNN in terms of depth localization, regardless of whether it was based on GCN, GATv2, or TransformGConv. These results demonstrated the superiority of the proposed PLAN in location estimations.

Table 1 Location performance in ridgecrest and Japan regions.

Furthermore, we used three metrics of mPrecision, mRecall, and mF1 (described in the Methods section), to quantitatively evaluate the performance of the five methods (Table 2). In five of the six metric scores for the P-wave and S-wave picking results, our attention mechanism-based GNN method outperformed the other methods. The only exception was the mPrecision metric of P-wave picking, where the EQTransformer showed slightly higher scores than PLAN. Notably, even the simplified version of the multi-station phase-picking method, such as the SAGE-based PLAN, outperformed both the single station-based picking methods of EQTransformer and PhaseNet in mF1 scores for S-wave picking. This indicated that the phase-picking accuracy is significantly improved by multi-station picking, which effectively utilizes inter-station contextual information.

Table 2 Detection Performance in Ridgecrest region.

We not only adjusted the time threshold while maintaining a constant picking probability for evaluation but also fixed the time threshold (True positive picks were defined as those within 0.5 s of the predicted pick). By changing the probability of picking threshold, we calculated and plotted the precision-recall curves for four models (Supplementary Fig. 2). Given that the curves of Trans-based PLAN consistently appear closer to the upper-right corner, aligning with the results discussed earlier, it is evident that the PLAN model exhibits superior performance in terms of F1 score, encompassing both P-wave and S-wave picking.

Application to seismicity in Japan

We retrained all the methods on the Japan training set for the evaluation. Compared to its performance in the Ridgecrest region, PLAN exhibited an even better performance in Japan (Fig. 3b, c). Further, PLAN demonstrated a remarkably better performance than PhaseNet and EQTransformer for both P- and S-wave picks. The offset predicted by PLAN was notably more accurate than that predicted by Aggregated-GNN, with a narrower residual distribution (Fig. 3d, e). In terms of depth estimation, although PLAN maintained a narrower residual distribution, the center of the distribution was shifted systematically, compared with the Aggregated-GNN method. Table 1 presents the comprehensive quantitative comparison of the results. Although the TransformGConv-based PLAN method did not demonstrate particular superiority in depth estimation, it excelled in offset estimation (Table 1). Further, the GATv2-based PLAN showed the lowest depth error, indicating potential improvement of localization capabilities of the proposed PLAN.

Similar to the Ridgecrest example, we assessed the phase-picking performance of various models applied to the test data from Japan using mPrecision, mRecall, and mF1 metrics (Table 3) and precision-recall curves (Supplementary Fig. 3). The TransformGConv-based PLAN model achieved superior results in terms of mRecall (95.14 for P-waves and 85.09 for S-waves) and mF1 (95.46 for P-waves and 86.72 for S-waves), whereas EQTransformer performed best in terms of mPrecision of P-waves and S-waves. TransformGConv-based PLAN demonstrated high mRecall scores, indicating that a large proportion of the samples containing P/S-waves were correctly detected. However, this was achieved at the expense of a slightly lower mPrecision compared to that of the EQTransformer, with some non-P/S-waves incorrectly classified as P/S-waves. The mF1 score provided a more comprehensive evaluation of the model performance, considering both the reduction in missed detections and the increase in correct detections. In this context, TransformGConv-based PLAN had the highest F1 score, indicating that it effectively reduced the missed detections of P/S-waves and increased the proportion of correct detections.

Table 3 Detection performance in Japan.

Application to the continuous waveforms of 2019 Ridgecrest sequence

One key factor in assessing the effectiveness of an earthquake monitoring approach is its ability to process continuous waveform data. PLAN assumes there always exist earthquake signals on every station, and it ignores the case that some stations only have pure noise. As a result, PLAN will issue an earthquake alert even if no earthquake occurs. To address this limitation, we have designed a PLAN-based workflow (Supplementary Fig. 4) that incorporates specific threshold selection procedures, enabling it to effectively handle continuous waveform data and generate an earthquake catalog. (More details of this workflow can be found in the Methods section.)

To assess the performance of this workflow in generating earthquake catalogs, we chose the Ridgecrest earthquake sequence for benchmarking. This choice was motivated by the availability of several well-established earthquake catalogs for this region, making it suitable for comparative analysis. To avoid data leakage, we took the precaution of re-segmenting our training dataset and made some data augmentation (More details of data augmentation can be found in the Methods section). Subsequently, we applied the PLAN-based workflow with the retrained model to process data recorded from July fourth (17:30:00) to July ninth (00:00:00), thereby generating our own earthquake catalog for benchmark comparisons.

We acknowledge that PLAN currently has limitations when dealing with multiple events within a specific time window. In such cases, PLAN focuses exclusively on the first event and outputs the earthquake time and event location corresponding to that event. Consequently, when working with a larger sliding window, there is a risk of missing some events. To address this problem, we implemented a more compact sliding window approach in the continuous waveform data, with each window spanning 30.72 seconds and a 25.72 second overlap between consecutive windows. It’s worth noting that this overlap duration is substantially shorter than what is typically employed in other deep learning phase-picking methods designed for single stations. The choice of such a large overlap maximizes the detectability for the dense earthquakes within a short time period. In addition, an example of event processing spanning adjacent time windows can be found in Supplementary Fig. 5.

Since multiple well-established catalogs exist for the 2019 Ridgecrest earthquake sequence, we compare our catalog with them, including SCSN, several deep learning-based methods (such as Liu et al.11’s catalog, GaMMA23 and EQNet3), and traditional template matching methods (such as Shelly49’s catalog and Ross et al.50’s catalog). As there are no ground-truth catalogs available for continuous waveform data, we employed the same consistent cross-validation test (proposed by Zhu et al.3) across different catalogs and utilized the same threshold (3 seconds) for true positive counting. During the cross-validation test, the evaluation metrics are significantly influenced by the number of events. To ensure a fair comparison, we adjusted the threshold, allowing PLAN to generate a catalog with a similar number of events to some of the existing catalogs (a total of 24,270 events). The temporal and spatial distribution of these events can be found in Supplementary Fig. 6.

Supplementary Table 1 presents the precision, recall, and F1-score results of the cross-validation test. These metrics are closely tied to the number of events in the catalog. For example, when considering the SCSN catalog as the benchmark, although Shelly’s and Liu et al.’s catalog exhibit higher precision and F1 scores compared to the others, it comes at the expense of lower recall. These catalogs pick fewer events, leaving many true events undetected. In such cases, the higher F1 score may not be a meaningful reference. Additionally, the situation with GaMMA’s catalog is similar. While its recall performance is comparable to the catalogs of Ross et al. EQNet, and PLAN, GaMMA’s notably high precision is also influenced by the number of events in its catalog. The significant difference in the number of events in GaMMA’s catalog, compared to existing catalogs, presents challenges in its evaluation. Therefore, in the cross-validation comparison, we primarily focus on catalogs with a higher number of events.

When considering the SCSN or shelly’s catalog as the benchmark, both the PLAN catalog and EQNet catalog exhibit almost identical precision, recall, and F1-scores, all of which slightly surpass the scores associated with Ross et al.’s catalog. Furthermore, when employing Liu et al.’s catalog as the benchmark, the PLAN catalog achieves the highest scores across all three evaluation metrics when compared to other catalogs with similar event quantities. Moreover, when Ross et al.’s catalog and EQNet catalog are employed individually as benchmark, the PLAN catalog consistently outperforms the other catalog in terms of precision, recall, and F1-scores. This indicates that the events detected by the PLAN catalog are consistently present in both Ross et al.’s and EQNet catalogs, implying a high level of credibility and a high likelihood of representing real earthquake events. In conclusion, when processing continuous waveform data, PLAN demonstrates performance that is either on par with or slightly superior to other state-of-the-art methods, consistently yielding a robust and high-quality catalog.

Discussion

PLAN is scalable for accommodating various numbers of stations per event. As PLAN is a network level picking and location model, we further investigated the effect of different numbers of stations on the network performance for phase picking and earthquake location with a test set of the Ridgecrest region (Fig. 4). We calculated the P- and S-wave picking residuals of the three different methods relative to the manual picking results, respectively (Fig. 4a, b). The residuals of the single-station-based picking methods, PhaseNet and EQTransformer, exhibited oscillations for samples with station numbers 3-13 as the number of stations increased. Contrastingly, the residuals of our simultaneous multi-station picking method, PLAN, exhibited a significant residual decrease as the number of stations increased. Although the prediction residuals of the single-station-based methods should not be significantly associated with the number of stations, their prediction residuals still decreased when the number of stations was 13-16. This was probably because the events recorded by more stations tended to be larger and easier to pick.

A comparison of the distribution of prediction errors for earthquake offsets and depths with respect to the number of stations indicated that the errors in PLAN were significantly smaller than those in the Aggregated-GNN method (Fig. 4c, d). However, the errors in offset prediction did not exhibit a significant decrease with an increase in the number of stations. This was likely because a large number of stations would include more distant ones that tended to have large offset prediction errors. As the offset error metric is defined as the average value acquired from multiple stations, an increase in the number of stations can lead to a slightly higher average error for a single event.

Furthermore, the statistical results for Japan (Fig. 5) were similar to those for the Ridgecrest region, with the PLAN method exhibiting smaller phase-picking errors than EQTransformer and PhaseNet, especially for S-wave picking. In addition, as the number of stations increased, the offset prediction error of PLAN became significantly smaller than that of the Aggregated-GNN.

Fig. 5: Comparison of prediction results using different number of stations in Japan.
figure 5

The four distributions (ad) are similar to those described in Fig. 4 and the only difference is that we have used logarithmic coordinates for the primary y-axis in a and b.

A similar pattern emerges when we compare the distribution of prediction errors in relation to earthquake magnitude and signal-to-noise ratio (SNR) (Supplementary Fig. 7 and Fig. 10). Larger magnitude earthquakes, which are typically detected by more seismic stations, exhibit reduced P- and S-wave picking errors, as well as prediction errors for earthquake offsets and depths, in both the Ridgecrest region and Japan. Additionally, larger SNR, which is usually associated with higher-magnitude events or seismic stations closer to the earthquake’s epicenter, also demonstrates smaller picking errors. PLAN, EQTransformer, and PhaseNet show nearly identical performance in terms of P-wave picking errors as earthquake magnitude or SNR increases. Nevertheless, our multi-station picking method exhibits a slight advantage in S-wave picking compared to the other two single-station picking methods, as evident from the narrower distribution of errors (especially in Supplementary Fig. 7b and Figs. 9b and 10b). Moreover, PLAN excels in terms of its location performance, particularly in offset prediction, surpassing Aggregated-GNN (Supplementary Figs. 7c10c). As earthquake magnitude increases, the offset error for both PLAN and Aggregated-GNN gradually decreases. Nonetheless, at any given magnitude level in both the Ridgecrest region and Japan, the offset error of PLAN remains lower than that of Aggregated-GNN (as depicted in the trends shown in Figs. 4c and 5c).

The ability of our network to handle varying numbers of stations can be attributed to the multi-station association module, which can be separated from the entire network and utilized in a manner similar to the Taup algorithm for estimating the arrival time of earthquakes at stations. Differing from the Taup algorithm, our association module does not depend on an input velocity model. Instead, it empowers the network to comprehend the concept of velocity, enabling the conversion of offsets into relative time shifts. Additionally, unlike the sequential processing of one station at a time in the Taup algorithm, our module simultaneously calculates the time shifts for multiple stations associated with a single event. In essence, our association module can be considered a computationally efficient 3D Taup algorithm that operates without requiring a velocity model. To evaluate the estimation accuracy of the arrival time using this module, we applied the estimated time shifts to align different stations (Supplementary Fig. 11). Because the multi-station association module can accurately estimate the arrival time, the original waveforms from all stations were aligned accordingly.

To further evaluate the estimation accuracy of the arrival time using this module, we employed the TauP algorithm based on the PREM model51 for comparison (Fig. 6). We also calculated the correlation coefficient (R) between the output of each method and the manually picked P/S-wave time differences. During the training process, this module used the offsets and depths obtained from the earthquake location module as inputs. Therefore, inputting the predicted offsets and depths into this module (Fig. 6d) could yield better P/S-wave time differences than inputting the labeled offsets and depths (Fig. 6c). The multi-station association module with manually labeled offsets and depths yielded less consistent results than the TauP algorithm. This discrepancy may not be solely attributable to errors in the deep learning estimation. Label inaccuracies may have also contributed to this outcome. This assertion was supported by the observation that using the neural network output as the input for the TauP algorithm resulted in greater correlation coefficients than when label was employed (Fig. 6a, b). Among all the evaluated methods, the estimation results in Fig. 6d show the highest correlation coefficients. Generally, the multi-station association module and the TauP algorithm based on the PREM model have the same level of accuracy in calculating P/S-wave time differences.

Fig. 6: Comparison of the P/S-wave arrival time estimation using different methods.
figure 6

a and b show the crossplots of the S-P differential arrival times computed by the TauP model with the input of offsets and depths from manual labels and network predications, respectively. The x-axis represents the manually picked S-P differential arrival times. The y-axis represents the S-P differential arrival times obtained by different methods.The crossplots in c and d show the S-P differential arrival times predicted by our multi-station association module, and they are consistent with those predicted by the TauP model. This indicates that the multi-station association module, detached from the entire trained PLAN model, works physically reasonable compared to the commonly used TauP model.

Since we have demonstrated that our association module can accurately generate P/S-wave time differences, it is important to note that when processing continuous waveform data, the Taup algorithm cannot reliably estimate wave travel times due to the lack of precise event location information. In contrast, our association module can provide more accurate estimates of P/S-wave time differences, enabling the determination of which stations are associated with the event (Supplementary Fig. 5).

Certain limitations persist within our method, particularly in processing continuous waveform data. While PLAN demonstrates strong performance on test data, it still struggles to provide highly accurate results in offset and depth estimation for continuous waveform data, and it may not match the precision of traditional localization methods. To enhance the accuracy of event localization, incorporating joint relocation algorithms, such as hypoDD52, into the workflow’s final steps would be beneficial. Furthermore, while we employ cross-validation to compare different catalogs in the Ridgecrest region, comparing catalogs that vary in the number of events remains a challenge. Therefore, developing an approach to compare the performance of various catalogs in the absence of ground truth data may be an urgent task for future research.

Moreover, the issue of generalization remains a concern. As the multi-station association module learns velocity concepts during training, pre-trained models from other regions may yield suboptimal results, particularly in regions with significant velocity variations. Although limited generalization is feasible in smaller geographic areas, such as employing a model trained on Southern California data for earthquake monitoring in Northern California. To address this concern, training region-specific models is a viable approach, as exemplified in our paper, where two distinct models were trained for the Ridgecrest region and Japan.

However, despite PLAN’s challenges with generalization, its multi-station association module, which learns velocity concepts during training, offers deeper insights into specific regions. It’s important to note that our current association module primarily captures relative velocities. Nevertheless, with adjustments to the labels of shift vectors, it can potentially learn absolute velocities. By inputting hypothetical offsets and depths for each grid in a region, the module could generate corresponding P and S-wave arrival times. Utilizing the time differences and distances between these grids, we can construct a 3D velocity model for a specific region. Thus, while our method may not yet generalize to a broader range of areas, it introduces a concept and possibility in the future: achieving a region’s velocity model concurrently with training an earthquake monitoring model using region-specific datasets.

In summary, we present a novel all-in-one multi-task multi-station system called PLAN for earthquake monitoring, which is capable of simultaneous phase picking, phase association, and earthquake location. Unlike current CNN-based methods that perform phase picking station-by-station, phase association, and location separately, our proposed GNN-based multi-station multi-task system best utilizes the inherent inter-task and inter-station constraints. The multi-station association module estimates the phase shift and improves the robustness and accuracy of the phase association process. Eventually, the resulting offsets and depth enables accurate event localization. Our method demonstrates the need to factor mutual constraints among tasks and stations into next-generation earthquake monitoring systems.

Methods

Graph based neural network

Several studies have shown that GNNs have the potential to deal with irregularly spaced stations for phase association and event localization14,16,35,36,37,38,39. Here, we build a graph-based network (Fig. 1) for mult-station earthquake monitoring. To utilize the GNN, we first need to change the data from the matrix format to the graph format and employ a graph-based representation of the stations, where each station is represented as a node in the graph and the three-channel data and the station location are used as the features of each node. In contrast to the current single-station processing methods4,5,6,7,8,9,10,11,12,15, which treat each three-channel data as an individual input sample, our approach inputs all the three-channel data received from multiple stations per event as a single sample. This allows for efficient aggregation of information from multiple stations during network training. As a result, the features of different stations could be effectively integrated using GNNs during the aggregation process.

In this study, we have evaluated various graph aggregation methods, including GCN40, GraphSAGE47, GAT53, GATv248, and TransformGCONV41. Through this evaluation, we have determined that TransformGCONV, which is based on attention mechanism42, is the most suitable module for the proposed PLAN. The message aggregation of TransformGCONV could be represented as:

$${{{{{{{{\bf{x}}}}}}}}}_{i}^{{\prime} }={{{{{{{{\bf{W}}}}}}}}}_{1}{{{{{{{{\bf{x}}}}}}}}}_{i}+\mathop{\sum}\limits_{j\in {{{{{{{\mathcal{N}}}}}}}}(i)}{\alpha }_{i,j}{{{{{{{{\bf{W}}}}}}}}}_{2}{{{{{{{{\bf{x}}}}}}}}}_{j},$$
(1)

where \({{{{{{{{\bf{x}}}}}}}}}_{i}^{{\prime} }\) represents the aggregated features at the source node, and xi and xj represent the features of the source and distant nodes before aggregation, respectively. W1 and W2 are the trainable matrices. In addition, the attention coefficients αi,j are computed via dot-product attention as follows:

$${\alpha }_{i,j}={{{{{{\mathrm{softmax}}}}}}}\,\left(\frac{{\left({{{{{{{{\bf{W}}}}}}}}}_{3}{{{{{{{{\bf{x}}}}}}}}}_{i}\right)}^{\top }\left({{{{{{{{\bf{W}}}}}}}}}_{4}{{{{{{{{\bf{x}}}}}}}}}_{j}\right)}{\sqrt{d}}\right),$$
(2)

where W3 and W4 are the trainable matrices. Similar to the attention mechanism42, the source feature xi and distant feature xj are transformed into query vector and key vector, respectively, using W3 and W4. Compared to other graph aggregation methods, the use of the attention-based mechanism (equation 1) in TransformGCONV allows for a more fine-grained representation of the relationship between different stations, thereby improving the accuracy and efficiency of the proposed method.

Network Architecture

Here, we design a multi-station multi-task network for simultaneous phase picking, association, and location. The network (Supplementary Fig. 1) comprises four components: a waveform feature extraction module, an earthquake location module, a multi-station association module, and a physics-informed multi-station phase-picking module. Similar to previous deep-learning-based phase-picking approaches3,6,12,15, we design an encoder to extract waveform features and a decoder to produce phase-picking results. However, to address the multi-station phase-picking problem, we introduce the GNN-based TransformGCONV for aggregating features from multiple stations.

Because aligned waveform features are easily used and aggregated in GNNs for multi-station phase picking, we do not employ it in the waveform feature extraction module (Supplementary Fig. 1a), where the features are relatively shifted in time. Although we use a U-shape neural network for feature extraction to solve the phase-picking problem, it could be replaced with other single-station-based phase-picking networks, such as EQTransformer. No matter what type of network architecture is used, the features extracted from the middle of the network are input into the earthquake location module, and the structure of the final few layers of the network are modified for the purpose of multi-station phase picking. Additionally, the kernel size of all convolutional layers in the waveform feature extraction network is set to 7.

For the earthquake location module (Supplementary Fig. 1b), we first extract features from the normalized coordinate information of the stations within the range of [0,1] through two fully connected layers (3-48-96). Simultaneously, the waveform features extracted from each station are further processed through several convolutional layers and then flattened. Subsequently, the position and waveform features are concatenated and passed through two fully connected layers (192-192-96). This fuses the position and waveform features at each station. The fused features are further aggregated among multiple stations by several GNN layers to predict the offsets of each station with respect to the event and its depth. Because there is only one depth parameter for each sample, we add a global average pooling before the output. In summary, this module allows the integration of both location and waveform information into the feature extraction process, which is crucial for accurate event localization.

Finally, in the physics-informed multi-station phase-picking module (Supplementary Fig. 1d), we incorporate physics-motivated constraints of time alignment among waveforms corresponding to the same earthquake event. We first utilize a mulit-station association module (Supplementary Fig. 1c) to calculate the relative alignment shifts between stations using the estimated offsets and the depth of the event. We then use the shifts to align the waveforms to a common time standard and aggregate the features across multiple stations in the phase-picking module. Subsequently, the aggregated features at each station are unaligned and fed to two layers of convolution to yield final P/S-wave picking results. This process leverages the physical information of the event location to improve the robustness and accuracy of the multi-station phase picking.

Multi-station association module

To simultaneously pick P/S-waves from multiple stations, PLAN utilizes GNNs, which typically aggregates raw signals received at different stations. Feature aggregation across multiple stations introduces inter-station constraints and enhances the features at each station. However, because of different travel times of the same source across different stations, directly aggregating the signals from multiple stations would deteriorate multi-station picking. To address this issue, the proposed method employes a multi-station association module to estimate the time shifts as illustrated in Fig. 1. The input to this module is the offset of each station with respect to the event and its depth. The module output is the corrected time shifts of the P/S-wave for each station. Using the criteria, the multi-station association module was trained to estimate the corrections of P/S-wave for each station. These corrections are then used to align the waveform features, enabling the graph convolution to aggregate the features in a temporally aligned space. Consequently, the method could enhance or compensate for the features at each station by fusing the aligned features from other stations, allowing simultaneous and accurate multi-station picking.

To assess the impact of the Multi-station association module on our picking results, we conducted a comparative analysis by training a neural network without this module and contrasting its performance with PLAN (Supplementary Fig. 12). Across all sub-figures, it is consistently evident that the method without the association module exhibits more pronounced overfitting, resulting in higher loss and lower accuracy compared to PLAN. These findings affirm the efficacy of the introduced multi-station association module in enhancing the network’s picking performance and bolstering the model’s overall robustness.

The multi-station association module can be utilized independently after training. It converts the distance and depth information into arrival information and calculates the S-P differential travel time54. Supplementary Fig. 13 illustrates the arrival time differences of the P/S-waves at various stations in Japan. Although the training process utilizes a maximum of 37 stations for a single event, the module can be adapted to cases with any number of stations (e.g., hundreds of stations shown in Supplementary Fig. 13a–d) to estimate the P/S-wave arrival time differences for all stations. These results indicated that the module effectively enforced physical constraints based on time shifts within the overall network.

Workflow of processing continuous waveform

To enable the application of PLAN to continuous waveform data, we have designed a PLAN-based workflow that incorporates specific threshold selection procedures, allowing for the subsequent generation of a catalog (Supplementary Fig. 4). To ensure the stability of PLAN and its applicability to processing the continuous waveform of the 2019 Ridgecrest sequence, we conducted a retraining process by re-segmenting our training dataset. This process involved excluding data recorded during earthquakes that occurred between July 4th, 2019, and July 9th, 2019. Additionally, during the training process, we introduced data augmentation techniques, including random shifts applied collectively to the waveform windows of all stations. The maximum allowable shift was set to 2 seconds. This approach aimed to improve the model’s adaptability to variations in picking positions within the waveform windows extracted from continuous data. Once the retraining was completed, we were able to utilize the retrained model for processing continuous waveform data using the following steps:

  1. (1)

    Initial prediction: We apply the retrained PLAN model to the continuous waveforms to obtain initial phase picks in the overlapped windows.

  2. (2)

    Shift and stack: We use the responses of P and S wave on multiple stations (i.e., the output of the Physics-Information Multi-Station Phase Picking Module in Fig. 1) to detect potential events. Similar to the “shift and stack” strategy widely used in array seismology, we shift the responses to the origin time using the P and S wave shifts (i.e., predicted by the Multi-Station Association Module) and stack them. If the stacked P/S wave response exceeds a threshold (2.4 and 1.2 for P and S wave), the window is expected to contain an event occurring when the stacked response peaks.

  3. (3)

    Station selection: For the events detected by “Shift and stack”, we need to use the picks on different stations to further locate the events. The stations need to be filtered as not every station has picks in the window; not every pick in the window is associated with the origin time estimated by Shift and stack. Only those picks near the theoretical moveout can be used to locate the event. Specifically, the shifted P and S pick should simultaneously exceed a probability threshold (0.24 and 0.12 for P and S wave) and be near the origin time ( < 2 s). If > 4 stations meet the criteria, the detected event can be further located.

  4. (4)

    Catalog generation: In the final step of our process, we input all the candidate stations into PLAN to predict the earthquake time and hypocenter. The earthquake time is calculated using the commonly assumed uniform velocity model of 6 km/s for P waves and 3.4 km/s for S waves in the Ridgecrest region. Simultaneously, the hypocenter is estimated through triangulation using the predicted offsets and depth (i.e., predicted by the Earthquake Location Module). After processing all the overlapped windows, we get a preliminary earthquake catalog from the continuous waveforms. Due to the large overlap, events can be detected by PLAN more than once. We remove the duplicate detections by limiting the minimal separation between consecutive occurrences to 2 s.

Loss function and training details

Our multi-task learning network model has three output results corresponding to phase picking, phase association, and earthquake localization. To train the model, we define three different loss functions for these three different tasks. For phase picking, instead of using the commonly used cross-entropy, we choose the mean square error (MSE) as the loss function, which is suitable for training in multi-task problems. To estimate offsets and depth, which is similar to event localization, we also use MSE as the loss function, as suggested by previous studies36,39. Finally, to calculate the P/S-wave shift, we define the loss function as follows:

$${{{{{{{{\mathcal{L}}}}}}}}}_{\Delta p}=\mathop{\sum }\limits_{i=1}^{n}| {{{\mbox{CTime}}}}_{p}-\left({{{\mbox{label}}}}_{{p}_{i}}+\Delta {t}_{{p}_{i}}\right)|,$$
(3)
$${{{{{{{{\mathcal{L}}}}}}}}}_{\Delta s}=\mathop{\sum }\limits_{i=1}^{n}| {{{\mbox{CTime}}}}_{s}-\left({{{\mbox{label}}}}_{{s}_{i}}+\Delta {t}_{{s}_{i}}\right)|,$$
(4)

where CTimep and CTimes represents the reference times where the P- and S-wave picks are aligned, respectively. The reference times for the P/S-wave features were set at the 10 and 15 s in the Ridgecrest region and at the 20 and 32 s in Japan. Additionally, to process continuous waveform data, the reference time for both the P/S-wave features in the retrained model for the 2019 Ridgecrest sequence processing was set to 0 seconds. Moreover, \({{{\mbox{label}}}}_{{p}_{i}}\) or \({{{\mbox{label}}}}_{{s}_{i}}\) represents the manually picked P/S-wave arrival time for each station, and Δt represents the predicted P/S-wave shift. Finally, we combine the three types of loss functions to form the overall objective function:

$${L}_{{{\mbox{total}}}}= \, {\lambda }_{1}{{{{{{{{\mathcal{L}}}}}}}}}_{{{{{{{{\rm{picking}}}}}}}}-{{{{{{{\rm{p}}}}}}}}}+{\lambda }_{1}{{{{{{{{\mathcal{L}}}}}}}}}_{{{{{{{{\rm{picking}}}}}}}}-{{{{{{{\rm{s}}}}}}}}}+{\lambda }_{2}{{{{{{{{\mathcal{L}}}}}}}}}_{\Delta p} \\ +{\lambda }_{2}{{{{{{{{\mathcal{L}}}}}}}}}_{\Delta s}+{\lambda }_{3}{{{{{{{{\mathcal{L}}}}}}}}}_{{{{{{{{\rm{offset}}}}}}}}}+{\lambda }_{3}{{{{{{{{\mathcal{L}}}}}}}}}_{{{{{{{{\rm{depth}}}}}}}}}.$$
(5)

Here, we set the coefficients λ1, λ2, and λ3 to 1.

During the training process, the model was optimized using the ADAM55 method with an initial learning rate of 0.001, which is gradually decreased with a decay rate of 0.9 every 100 epochs. To enhance the training efficiency, we randomly selected 2048 events from the training set for each epoch, rather than using the entire data. The model was trained for a total of 2000 epochs with a batch size of 16, and the training process required approximately 24 h using 1 NVIDIA Tesla A100 GPU.

Evaluation metrics

In previous studies3,12, true positive phase picks were defined as those within 0.5 s of the predicted pick. The rest were counted as false positives. Nevertheless, owing to potential errors in the labels of the dataset, such statistical results based on a single threshold may not be reliable. Thus, to better evaluate the performance of algorithms, we introduce new metrics, mPrecision, mRecall, and mF1, which are calculated using multiple thresholds, following previous research56. The metrics are defined as:

$$ {{{{{{{\rm{mPrecision}}}}}}}} \\ = ({{{{{{{\rm{Precision}}}}}}}}@11+{{{{{{{\rm{Precision}}}}}}}}@12+\cdots +{{{{{{{\rm{Precision}}}}}}}}1@50)/40,$$
(6)
$${{{{{{{\rm{mRecall}}}}}}}}=({{{{{{{\rm{Recall}}}}}}}}@11+{{{{{{{\rm{Recall}}}}}}}}@12+\cdots +{{{{{{{\rm{Recall}}}}}}}}1@50)/40,$$
(7)
$${{{{{{{\rm{mF}}}}}}}}1=({{{{{{{\rm{F}}}}}}}}1@11+{{{{{{{\rm{F}}}}}}}}1@12+\cdots +{{{{{{{\rm{F}}}}}}}}1@50)/40.$$
(8)

where x@11, x@12,  , x@50 are Precision, Recall, or F1 metrics when the thresholds are 11, 12,  , 50 samples (corresponding to 0.11 s, 0.12 s,  , 0.5 s of time), respectively. These metrics, mPrecision, mRecall, and mF1 reward detectors with better picking results and, therefore, can more reasonably or fairly assess the performance of the different methods.