High-resolution seismic tomography of Long Beach, CA using machine learning

We use a machine learning-based tomography method to obtain high-resolution subsurface geophysical structure in Long Beach, CA, from seismic noise recorded on a “large-N” array with 5204 geophones (~13.5 million travel times). This method, called locally sparse travel time tomography (LST) uses unsupervised machine learning to exploit the dense sampling obtained by ambient noise processing on large arrays. Dense sampling permits the LST method to learn directly from the data a dictionary of local, or small-scale, geophysical features. The features are the small scale patterns of Earth structure most relevant to the given tomographic imaging scenario. Using LST, we obtain a high-resolution 1 Hz Rayleigh wave phase speed map of Long Beach. Among the geophysical features shown in the map, the important Silverado aquifer is well isolated relative to previous surface wave tomography studies. Our results show promise for LST in obtaining detailed geophysical structure in travel time tomography studies.

Travel time data. We use the 1 Hz Rayleigh surface wave band from the Long Beach data, which corresponds to near-surface geophysical features (~100-500 m depth). For each station pair, cross-correlations of all 1-h segments from the 3-week recording were normalized and stacked to obtain the causal and anti-causal travel times. The final travel times were obtained by averaging these causal and anti-causal components 8 . After quality control the number of useful cross-correlations for 1 Hz was ~8.5 million. Quality control included SNR thresholding and removal of travel times with ranges less than one wavelength. The cross-correlations further suffered from phase ambiguities, which were reduced in preprocessing. This was done by clustering the rays 27 and filtering travel times that exceeded the median travel times of the clusters by one half-period (0.5 s for 1 Hz). Further, cross-correlations were rejected if travel times from virtual source pairs disagreed by more than one-half period. This reduced the number of useful cross-correlations to ~3 million. For further details and preprocessing steps, see Methods and 8 .
LSt implementation. The 1 Hz Rayleigh surface wave travel times are used by LST to estimate a 300 × 206 pixel phase speed image of the Long Beach region. We assume straight-ray surface wave propagation, which yields a simple linear measurement model (Eq. 1). Using the measuments, LST alternates between solving larger-scale, or global, phase speed features (details in Methods) and smaller-scale or local phase speed features. The global problem (Eq. 4) is solved by least squares. Since the tomography matrix is sparse, we use the sparse least square program LSMR 29 . The local sparse problem (Eq. 5) is solved using orthogonal matching pursuit 25 , and the dictionary is learned using the iterative thresholding and signed K-means algorithm 30 . The reference phase speed is constant, and estimated as the average phase speed of all ray paths. The LST tuning parameters are {n, T, Q, λ 1 , λ 2 }. n is the number of pixels per patch. We use square, 10 by 10 pixel patches, giving n = 100 pixels per patch (yielding 300 × 206 = 61,800 patches). We assume sparsity T = 2 (see Eq. 5), meaning that each patch uses two atoms from the dictionary D. Each atom in D is a vector with dimension n = 100 pixels (the patch size). We assume D has Q = 200 atoms, or twice the patch dimension. D is initialized with Gaussian random vectors of unit norm. λ 1 , which is the ratio of travel time error variance to global slowness variance is set as λ 1 = 13 km 2 (see Eq. 4). λ 2 , which is the ratio of patch slowness variance to global slowness variance, is set as λ 2 = 0 assuming a sparse slowness representation (see Eq. 5). When λ 2 = 0, the sparse slowness is simply the average of the patch slownesses (see Methods).
interpretation of phase speeds. In the middle of the LST phase speed image (Fig. 1B), particularly to the West, a large fast anomaly is observed between 33.78° and 33.82° latitude. This fast anomaly corresponds well with the Upper Wilmington (UW) Quaternary formation, which includes the Silverado water bearing unit (Lower Pleistocene age, ~300-580 ka) that supplies nearly 90% of the total ground water extracted in Long Beach 31,32 . Based on a 3D model 32 ( Fig. 2A) of the water-bearing structures in Long Beach obtained using borehole, seismic, and gravity surveys, the Silverado is significantly denser (2,290 kg/m 3 ) than the surrounding formations (2,050-2,100 kg/m 3 ) due to its coarse-grained facies. From empirical relations of density and seismic wave speeds, we expect the UW formation to increase V s by ~150% relative to the surrounding formations (see Methods). Since Rayleigh wave phase speed is dependent primarily on V s , we conclude that this high velocity region of the map likely corresponds to the Silverado aquifer.
The proposed attribution of the fast anomaly to the Silverado aquifer is further supported by simulating the 1 Hz Rayleigh surface wave phase speed (Fig. 2D) using the Silverado depth range inferred in 32 and 31 (see Methods). In the region of the survey used from 32 , where both Silverado depth and thickness were available, the simulation shows a gradual increase in phase speed from south to north. However, per 31 , the Silverado is likely www.nature.com/scientificreports www.nature.com/scientificreports/ absent about 1 km south of the NI fault in the region of the Long Beach array (Fig. 3). With this assumption for the simulation, the predicted trend and magnitude and phase speeds south of the NI fault compare well with the LST result (Fig. 1B). Relative to the eikonal tomography phase speed estimates of the Long Beach (Fig. 4B), the phase speed from LST better shows the broad fast region predicted by the simulation. It is clear that the Silverado is resolved particularly well by the LST in the west-central region. Moreover, well logs support an extension of the high-velocity anomaly toward the NE across the Newport-Inglewood (NI) fault zone. This is observed in the LST phase speeds (see Fig. 2D), beyond the region of the gravity survey. The LST result appears to corroborate the older study 31 , and contradicts some of the results of 32 , which was based exclusively on a gravity survey in the region of the Long Beach array.

Discussion
These results show that the LST method, by assuming that patches of seismic phase speed fields are repetitions of a set of few patterns, contained by the dictionary, can be used to further leverage existing seismic data to obtain high-resolution phase speed images from regions of interest. Since the dictionary (Fig. 5A) is learned from the travel time data data via machine learning, it is well adapted to the true phase speeds. LST with dictionary  31 overlain with LST phase speed (Fig. 1B), modified from 31 . The results of 31 contradict 32 , suggesting Silverado is absent south of the high phase speed anomaly in the west-centra part of the LST map. The findings of 31 are corroborated by the LST result. The lower extent of the high-speed anomaly is used to generate a hypothesized phase speed map (Fig. 2D). We also note that the phase speed anomalies near and to the north of Signal Hill (indicated above) correlate well with the contours. (2019) 9:14987 | https://doi.org/10.1038/s41598-019-50381-z www.nature.com/scientificreports www.nature.com/scientificreports/ learning provides a flexible model that is capable of modeling smooth and discontinuous slowness features. In the context of ambient noise tomography, LST leverages the dense sampling by learning the phase speed patterns. Thus, we obtain high-resolution slowness maps that well complement other geophysical sensing modalities and existing studies, for better estimating at least near-surface Earth structure. In this novel application of machine learning theory to near-surface seismic tomography, it is likely that we have further-characterized key water-bearing aquifers in Long Beach. We believe LST can help provide important geophysical insights in many tomography scenarios.

Methods
LSt theory and implementation. Our proposed locally-sparse travel time tomography (LST) approach obtains high resolution by assuming that small patches of discrete phase speed maps are repetitions of few elemental patterns from a dictionary of patterns. Such patterns, referred to as atoms (chemistry analogy) are learned in parallel with the inversion using dictionary learning, a form of unsupervised machine learning. Relative to conventional tomography methods, the sparsity of the dictionary representation permits smooth and discontinuous, high-resolution features where warranted by the data. In the following, we present an overview of the LST theory. For more details, please see 15 .
In LST, surface wave propagation is approximated as straight ray paths through an N = W 1 × W 2 pixel phase speed map, and the travel time perturbations t ∈ R M from a known reference for M rays are modeled as g where A ∈ R M×N is the tomography matrix, s g ∈ R N is the perturbation global slowness (inverse of speed), and  ∈ R M is Gaussian noise σ ε I (0, ) 2  , with σ 2  the noise variance. We call Eq. 1 the global model, as it captures the large-scale features that span the discrete map and generates t.
We consider a second slowness model perturbation s s ∈ R N , called the sparse slowness, in which × n n groups of pixels are represented as sparse linear combinations of atoms from a dictionary. The patches are selected from s s ∈ R N by the binary matrix R ∈ {0, 1} n×N , and modeled as where R i s s ∈ R n is the i-th patch, D ∈ R n×Q is the dictionary of Q atoms, x i ∈ R Q coefficient estimates for the i-th patch, and T is the number of non-zero coefficients in x i . We consider all overlapping patches, and wrap the patches at the edges. Thus the number of patches is N. The  0 pseudo-norm penalizes the number of non-zero coefficients 25 . We call Eq. 2 the local model, as it captures the smaller scale, localized features contained by patches. The dictionary D is assumed unknown and is learned from the data during the inversion. The global (Eq. 1) and local (Eq. 2) models are combined into a Bayesian maximum a posteriori (MAP) objective where ŝ g is an estimate of the global slowness perturbation, σ g 2 is the global slowness variance, ŝ s is the estimate of the sparse slowness perturbation, σ p,i 2 is the variance of the patch slowness, and  X ∈ R Q×I is the coefficient estimates.
We find the MAP estimates {ŝ g ,ŝ s ,  X} using a block-coordinate minimization algorithm by decoupling the local and global models via substitution 17,25 . The global objective is, from Eq. 3, The dictionary learning problem (Eq. 6) is here solved using the iterative thresholding and signed k-means (ITKM) algorithm 30 . After  D is obtained, the coefficients  X = [x 1 , ..., x I ] are solved from Eq. 5 using orthogonal matching pursuit (OMP) with the same sparsity level T as ITKM. Then with  X,  D, and global slowness ŝ g from Eq. 4 we solve for s s . Equation 3 gives, assuming constant patch variance σ p,i We note that the LST approach performs well for both prescribed (e.g. wavelet and DCT) and learned dictionaries (see Fig. 5). But in general we expect the tomographic image fidelity of the learned dictionary inversion to be greater than that of prescribed dictionaries. This follows the results of simulations in 15 (e.g. Figures 5, 6, 8  and 9), which compares the performance of prescribed and wavelet dictionaries on synthetic slowness maps and shows learned dictionaries can reduce the RMSE relative to ground truth by greater than 50% over prescribed dictionaries. This also follows the "synthesis" (vs. "analysis") paradigm in image processing, which prefers to learn (2019) 9:14987 | https://doi.org/10.1038/s41598-019-50381-z www.nature.com/scientificreports www.nature.com/scientificreports/ dictionaries if sufficient training data is available rather than use e.g. wavelet functions which must be justified by detailed theoretical analysis for each application. Dictionaries can be learned from a corpus of data, or as was done here, from a large number of examples from one imaging scenario. This has proven a successful approach in e.g. image restoration 16 and medical imaging 17 . For more details please see 25 (pp. 227-246).
Conventional tomography. We implement conventional tomography using a Bayesian approach 33 , which regularizes the inversion with a global smoothing (non-diagonal) covariance. Considering the measurements (1), the MAP estimate of the slowness is where η = (σ ε /σ c ) 2 is a regularization parameter, with σ c the conventional slowness variance, and Here l i,j is the distance between cells i and j, and L is the smoothness length scale 33,34 . For our simulation we choose L = 10 km and η = 100 km 2 . This gives the best tradeoff between detail and fidelity.
LST, eikonal, and conventional performance comparisons. We quantify the relative quality of the LST and eikonal phase speed maps (Fig. 4) by variance reduction, and using visual quality scores derived from natural images. The LST provides a 57% variance reduction and conventional gives 59%, whereas eikonal gives a 28% reduction. The increased variance reduction in the LST and conventional inversions over the eikonal result is likely because the eikonal tomography method does not explicitly minimize the travel time residual. Both LST and conventional tomography minimize the travel time residual in their map estimation (Eq. 4 and 9). Since there is no true phase speed map available, we use reference-less image quality metrics to help quantify the quality of the LST and eikonal phase speeds. While such metrics may not reflect the truth of estimated geophysical features, they can help quantify corruption of the geophysical features. We use the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) 35 , the Natural Image Quality Evaluator (NIQE) 36 , and the Perception based Image Quality Evaluator (PIQE) 37 . The results are summarized in Table 1. Overall, LST obtains a better score on 2 of 3 of the metrics. The BRISQUE metric has incorporated human opinions of image quality, whereas NIQE and PIQE do not. Hence BRISQUE may be less suited to our application. Since surface wave phase travel times potentially suffer from phase ambiguities, we also performed an LST inversion of the 1 Hz Rayleigh wave group travel times using the same parameters as the LST phase speed map (Fig. 4A). The LST group speed map is shown in Fig. 6. The group travel times are not subject to phase ambiguities, but may present other problems such as erratic arrival picks. The trends in LST group speed map (Fig. 6) compare well with those of the LST phase speed map (Fig. 4A), and both show a large fast anomaly south of the NI fault. We note that the group speed is slower overall, than the phase speed map, which is expected.
We compare the results of the LST to eikonal tomography 8,38 and conventional tomography (Fig. 4). Eikonal tomography avoids the inversion of large tomography matrices in favor of solving a number of simpler subproblems, while also partly accounting for wave refraction. For each virtual source, a phase speed map is estimated from the local gradients of the smoothed travel time surface. The individual phase speed estimates from all the virtual sources are then averaged to obtain the final phase speed map. Since LST uses the straight ray assumption, it is more similar to conventional than eikonal tomography.
However, LST provides improved results over conventional tomography with less computational burden. Conventional tomography 33 complexity is dominated by square matrix inversion N ( ) 3  , though approximate solution methods are slightly less complex. For large tomography matrices A, LST is dominated by matrix multiplication in the LSMR algorithm 29  MN (2 ). Since for LST the sparse matrix A is used directly, the memory required for a tomography problem with M travel times scales linearly with the map size N. For conventional tomography, the memory required scales by N 2 . Hence LST could be used for much larger maps than conventional tomography.
Geological interpretation. The LST sensitivity depth range (~100-500 m) with 1 Hz Rayleigh surface waves is occupied by Pleistocene and Holocene deposits 32 , which contain almost all the ground water resources in the area, Fig. 3A-C. The Silverado formation (Lower Pleistocene age, ~300-580 ka, named in 31 ), accounts for nearly 90% of the total ground water extraction in the region considered here 39 . The producing aquifers consist of sand and gravel, and in particular the Silverado unit is characterized by coarse-grained sediments.
Ponti et al. 32 generated a 3D sequence stratigraphy model of Quaternary layers at the Dominguez Gap in a 16.5 × 16.1 km area that overlaps with our LST model region (see Fig. 2). The study was aided by 5 reference boreholes drilled to more than 450 m depth. In addition, more than 300 oil and water wells were compiled and used in the study. An important finding from this study was a fault, striking W-NW in agreement with the general trend www.nature.com/scientificreports www.nature.com/scientificreports/ of the area, named the Pacific Coast Highway (PCH) fault, with a progressive vertical throw (up to 200 m) causing displacement of all pre-Holocene formations down to the north (see Fig. 2).
An area-wide gravity survey 32 was carried out to invert for stratigraphy along a N-S profile, ~1 km west of the LST result region (see Fig. 2). The Silverado aquifer has significantly higher density (2290 kg/m 3 ), due to its coarse-grained facies, than the surrounding formations (2050-2100 kg/m 3 ). The Silverado gravity anomaly is in general agreement with the sequence-stratigraphic model, including the termination of the Silverado unit just south of LBCH. However, the Poland et al. 31 survey contradicts the Ponti et al. 32 inferred stratigraphy. In 31 , it is concluded that the Silverado is missing about 1 km south of the NI fault -near the termination of the LST phase speed anomaly (see Fig. 3).
Several established empirical relations positively correlate seismic wave speeds with density. Gardner et al. 40 relates density ρ to P-wave speed V p as ρ = . where V s is in km/s. Equation 12 gives a 150% increase in V s using derived values of V p 40 for the densities of the Silverado and surrounding layers. Since Rayleigh surface wave phase speed is dependent primarily on V s 42 , these results suggest that the Silverado aquifer should give rise to a significant phase-speed anomaly.
To help verify the fast anomaly in the LST result (Fig. 1B), we calculate the 1 Hz Rayleigh surface wave phase speed for the survey region based on predicted Silverado V s from empirical relations Eq. 11 and 12. The method and results are summarized in Fig. 7. The Silverado elevation and thickness inferred in 32 are interpolated (Fig. 7A,B) to obtain the depth ranges of the aquifer within the survey region. It is assumed per 31 that Silverado is missing south of the fast anomaly. Further, we assume an average V s profile for the region based on Rayleigh wave dispersion measurements in 8 . For each discrete Silverado depth range, the Silverado V s is simulated by doubling the average V s in the Silverado depth range (e.g. Fig. 7D). The 1 Hz Rayleigh surface wave phase speed for the survey region (Fig. 7C) is estimated using numerical forward modeling 43 .
We also consider the effect of fluid saturation of the local strata in qualifying the high-velocity anomaly in our 1 Hz phase map. The depth to the water table in Long Beach (average from 150 wells 44 ) is about 7 m. Below the water table, the layers may be considered 'wet' to the largest depths in the deep water wells from 32 (Pliocene age, 450 m+), though the layers are not necessarily 'water-bearing' per ground water extraction terminology. While laboratory experiments have shown for large V s that the presence of fluids tends to decrease V s of rock, this trend is insignificant for V s ≈ 1,000 m/s 45 . The phase speeds observed in the high-velocity anomaly detected in our study are within this range. Thus, we conclude it unlikely that the high-velocity anomaly in our 1 Hz phase map is caused by a contrast in wet versus dry rock.