A joint matrix minimization approach for seismic wavefield recovery

Reconstruction of the seismic wavefield from sub-sampled data is important and necessary in seismic image processing; this is partly due to limitations of the observations which usually yield incomplete data. To make the best of the observed seismic signals, we propose a joint matrix minimization model to recover the seismic wavefield. Employing matrix instead of vector as weight variable can express all the sub-sampled traces simultaneously. This scheme utilizes the collective representation rather than an individual one to recover a given set of sub-samples. The matrix model takes the interrelation of the multiple observations into account to facilitate recovery, for example, the similarity of the same seismic trace and distinctions of different ones. Hence an l2, p(0 < p ≤ 1)-regularized joint matrix minimization is formulated which has some computational challenges especially when p is in (0, 1). For solving the involved matrix optimization problem, a unified algorithm is developed and the convergence analysis is accordingly demonstrated for a range of parameters. Numerical experiments on synthetic and field data examples exhibit the efficient performance of the joint technique. Both reconstruction accuracy and computational cost indicate that the new strategy achieves good performance in seismic wavefield recovery and has potential for practical applications.

different algorithms with respect to p = 1 and 0 < p < 1. For grouped feature selection, Suvrit 28 addressed a fast projection technique onto l 1, p -norm balls particularly for p = 2, ∞. But the derived method in 28 does not match the proposed matrix optimization problem (11). Similar joint sparse representation has been used for robust multi-modal biometrics recognition in 29 . Sumit et al. 29 employed the traditional alternating direction method of multipliers to solve the involved optimization problem. Wang et al. 30 applied l 2, 0 + -norm to semi-supervised robust dictionary learning, while the optimization algorithm has not displayed definite convergence analysis 30 .
Recently, matrix-minimization methods with nuclear norm have been developed for seismic wavefield recovery [31][32][33][34] which mainly considers the rank reduction as the sparse pattern in 2D cases. To avoid the expensive computations in solving the involved matrix completion optimization problems, a matrix factorization strategy was developed in 31,32 . This paper proposes a different matrix minimization approach based on l 2, q −l 2, p norm which naturally generalizes the representative vector to matrix in joint distribution sense. A unified method is developed to solve the matrix optimization problem with mixed norm for any q = 2 and 0 < p ≤ 1. The innovations of this paper can be listed as follows: 1) A jointly sparse matrix minimization model is developed for seismic wavefield recovery. This approach employs matrix to expresses multiple signals simultaneously. The measurement of matrix row coefficients are expected to exhibit the compact priori of multiple observations which is different from the existed methods based on matrix nuclear-norm minimization [31][32][33][34] . 2) A unified algorithm is developed to solve the mixed matrix optimization problem (7) for any p ∈ (0, 1]. This algorithm needs only matrix-vector operations but not matrix factorization which can be easily adapted to large-scale cases. The convergence analysis is also demonstrated. 3) Numerical experiments on synthetic and field data are carried out. The results on seismic wavefield recovery exhibit the efficient recovery performance of the joint sparse expression strategy.

Modeling
Given a set of seismic signals (traces) x 1 , x 2 , …, x l in n-dimensional space, each signal x j (j = 1, 2, …, l) is sensed by m sensors to yield seismic wavefield records as Denote Ψ the orthogonal matrix constituted by the orthogonal bases, then we have a more compact transformation L = AΨ ∈ R m × K . Consequently the systems (1) and (2) can be incorporated to 1 is the coefficient vector (weighting factor) corresponding to the seismic signal x j . Usually, problem (3) is ill-posed due to the limitation of acquisition and violation of sampling requirements. Sparse regularization is preferred to restore the operation coefficients from the under-determined linear combination system (3). A general l q −l p (q > 0, p > 0) model was presented in [16] is the stabilizer bearing prior information with respect to d j and α j > 0 is a regularization parameter. When 0 < p ≤ 1, the minimization model (4) tries to find a sparse recovery coefficient m j with the least nonzero entries. However, the framework (4) recovers the weight factor m j only using the j-th seismic trace record d j independently which totally ignores the correlation with other sampled data d j ( ≠ j j). Generally, multiple seismic wavefield traces are related to each other. The similarity and difference hidden in the given group of seismic traces are expected to improve the recovery performance. To detailedly demonstrate the correlationship among multiple seismic traces, we randomly choose three trace observations from a seismogram generated from a seven layers geologic velocity model (see Experimental Section for details). Two neighboring traces are denoted by d 1 and d 2 while the third one d 3 is relatively far from them. We separately recover the representation coefficients = ⁎ m j , 1, 2, 3 j by solving where m j k is k-th entry of m j . The weight values of recovered coefficients are plotted in Fig. 1(a-c). The horizontal axis denotes the coordinates of the representation vector while the vertical axis shows the weight quantities of . The curves clearly display the similar clustering and sparse pattern of three recovered coefficients. The correlations inspire us to assume that the multiple traces coefficients share the same distribution. For comparison, we jointly recover three coefficients simultaneously from D 1, 2, 3 = [d 1 , d 2 , d 3 ] ∈ R m × 3 by a matrix minimization problem 3 is the k-th row of M 1, 2, 3 . Since three vector minimizations as (5) are integrated to a matrix one (6), each entry m j k of representative vector is spanned to a row vector ∈ m R k 1,2, 3 3 . Hence the absolute values of weight entries in (5) are naturally generalized to l 2 norm of row vector for its smoothness, that 3 2 . To illustrate the jointly recovered coefficient matrix ⁎ M 1,2,3 of (6) also follows the similar variation as in Fig. 1(a-c), we measure the l 2 norm of each row vector in the joint sense corresponding to Clearly, the joint representation coefficients also exhibit similar sparse pattern and weight concentration to the individual models (see Fig. 1(d)).
Under the assumption that multiple seismic wavefield traces jointly share the similar weight parameter pattern, we propose to express all the sub-sampled observations over the same bases simultaneously as where D = [d 1 , d 2 , …, d l ] is composed of l seismic observations and M = [m 1 , m 2 , …, m l ] denotes the corresponding coefficient matrix. As far as the columns are concerned, the equation (8) is an easy consequence of the equation (3). Figure 1 has demonstrated that the multiple seismic traces are related to each other, especially when the samples are obtained in the similar fields. We reasonably measure the joint compactness and correlation of the multiple observations in row sense. By reviewing l q −l p (q > 0, p > 0) model (4), we notice that the expression errors e j = Lm j −d j , j = 1, 2, …, l and the priori of representation coefficients are assumed to submit to the independent identically distribution,   (3) is spanned to a row vector in the joint expression system (8), the absolute value of the scalar component is naturally replaced by a vector norm. Euclidean norm is preferred for its smoothness and easiness. Based on the analysis (9) and (10), the joint sparse priori of coefficient matrix M and fidelity error matrix E = LM−D can be considered where m k , e k are the k-th row vectors of M ∈ R k × l and E ∈ R m × l respectively.α k > 0 is a constant and . 2 stands for the Euclidean norm. In the similar relationship between (4) and (9), the joint matrix minimization approach for the ill-posed linear system (8) can be generally formulated as where the l 2, p norm of the priori matrix M is defined as is equivalent to m j p . When Λ takes scalar identity, the joint system (11) is exactly reduced to (4).
There are different choices of the parameter pair q > 0 and p > 0. Here we are interested in q = 2 and p ∈ (0, 1] for the practical purpose. Extensive studies have illustrated that the fractional norm l p (p ∈ (0, 1)) has better sparsity than l 1 norm [35][36][37][38][39] . But the l p norm is neither Lipschitz nor convex which brings computational challenge. This paper presents a unified algorithm to solve the mixed l 2, p regularized matrix minimization problem (11) for any p ∈ (0, 1]. The computational results in seismic wavefield recovery validate the efficient performance of the joint matrix minimization approach. The convergence properties of our new algorithm are also analyzed.

Algorithms
In this section, a unified method will be developed to solve the l 2, q −l 2, p matrix minimization problem for any q = 2 and 0 < p ≤ 1. Especially when p is fractional, (11) is neither convex nor Lipschitz continuous which brings many computational difficulties. Actually the unconstrained l q -l p minimization is strongly NP-hard for any 0 < q or p < 1 40 . Reweighed minimization algorithm 35,41,42 is an efficient algorithm for solving the l 2 -l p (0 < p < 1) vector minimization problem which has been extended by Wang et al. 43 to solve matrix minimization problem. Even the problem considered in 43 is the special case of (11) with q = p ∈ (0, 1], the idea motivates us to develop an iteratively quadratic algorithm for the generalized l 2, p matrix minimization for p ∈ (0, 1]. Moreover, the convergence analysis will be uniformly demonstrated.
After simple transformation, ΛM can be rewritten as where ⋅ Tr( ) stands for the trace operation and It is well known that the KKT point of the unconstrained optimization problem (11) is also the stationary point of J(M) 44 . Compute the derivative of J(M) with respect to matrix M and set it to zero, we get the KKT equation of the problem (11) as follows Thus solving (11) is reduced to finding the solution of the nonlinear equation (16). If H is fixed and the matrix = is invertible, equation (16) can be solved by We notice that if some row of M is zero, the diagonal entries of H cannot be generated, nor can N. Then the iteration breaks down. In view of the seismic wavefield recovery, the zero row means the corresponding basis function has no contribution to reconstruct all the observed seismic traces. For example, if m k = 0, then L k (the k-th column of transformation matrix L) is nothing with the observations D in the representation system (8). To avoid the possible breakdown of the matrix N in (17) and reasonably explain this numerical behavior, we apply the Sherman-Morrison-Woodbury formula 45 (17) can be rewritten as where I m is m-dimensional identity operator. If matrices G and M are computed alternatively corresponding to equations (18) and (19) respectively, then an iterative procedure can be naturally developed The iterative algorithm is outlined in Algorithm 1.
Step 3. For t = 1, 2, … until ρ ≤ t  do: The m t k (k = 1, 2, …, K) means the k-th row vector of M t . Algorithm 1 aims to solve the fixed-point system (16)  , the sparse parameter p ∈ (0, 1] aims to find a solution with many zero row vectors of the l 2, p -regularized matrix minimization problem (11). This means that many basis functions have no contribution to reconstruct the seismic wavefields which accords with the prior knowledge. Therefore (m t ) k = 0 might frequently occur during the iterations of Algorithm 1. We may formulate the following statement.
, the k-th column of the matrix L is unnecessary in the linear system (8) and the variational function J(M) in (15). So we can discard the k-th column of the matrix L to reduce the system without any loss. The improvement of Algorithm 1 can be concluded as Algorithm 2.
Algorithm 2. Solving problem (16) for any p ∈ (0, 1] Step 1. Input L ∈ R m × K , D ∈ R m × l . Set the sparse parameter p ∈ (0, 1] and the diagonal matrix α α α Λ = diag{ , , , } 0 (  Once the nonzero set of the t-th iteration has been fixed, the subproblem (21) can be solved in a variety of ways such as preconditioned conjugate gradient methods 46 , nonmonotone gradient descent methods 47,48 , and so on. The framework can be concluded as Algorithm 3.

No noise
Noise level 0.
Set the sparse parameter p ∈ (0, 1] and the diagonal matrix . Given stopping criterion  > 0. Step 2. Set t = 1 and initialize Figure 2. One-dimensional experimental results via model (11) with q = 2 and = p 1/2 for noisy data: (a) comparison of the original and the recovered signal; (b) difference between the recovered signal and the original signal.

Experimental results
To validate the efficiency of the joint matrix minimization approach and the unified algorithm for the problem (11), we perform three tests: (1) restoration of the input one-dimensional random signal with the randomly  One-dimensional signal reconstruction. We randomly take samples to generate the matrix L. For implementation, we try to restore the signal by the model (11) with q = 2 and p ∈ (0, 1].  For the one-dimensional case, the matrix M is reduced to a vector, hence the unified Algorithm 3 can be used for solving (11). For comparison, we also apply spectral projected gradient (SPG) method 49 to solve the l 1 -regularization problem. The code of SPG is downloaded from http://www.cs.ubc.ca/~mpf/spgl1/index.html. Two algorithms are carried out in the same environment and choose their best regularization parameters. The comparison items include err rel value, SNR and CPU running time (second). Each experiment is repeated five times and the average values are reported in Table 1. It indicates that both methods perform well for one-dimensional signal reconstruction problem.
Apart from the regular data, we also consider the noisy cases to show the robustness of two methods. Different noise levels are added to the simulated data. Noise level 0.001 means the noise is randomly generated with zero mean and 0.001 variance. The results of Algorithm 3 with sparse parameters p = 1 and p = 0.5 are displayed in Table 1. Compared with the l 1 -regularized minimization model, the half-norm regularized minimization behaves better in reconstruction. Figure 2 plots the recovery performance of the Algorithm 3 with p = 0.5 on noisy data. Figure 2(a) is the comparison of the real signal and the recovered signal, Fig. 2(b) illustrates the difference between the recovered signal and the input (true) signal. The recovery images of other cases are similar. The figures reveal that our model and algorithm perform well for one-dimensional seismic wavefield reconstruction problem even in noisy cases.
Reconstruction of seismograms from a layered model. Now we consider a seismogram generated from a seven layers geologic velocity model where the spatial sampling interval is 15 meters and the time sampling interval is 0.002 second. The velocity varies from 2500 m/s to 5500 m/s. The seismogram is generated using a source function given by a Ricker wavelet with central-frequency of 25 Hz. The dataset contains 256 traces with 256 time samples in each trace. Different percentages of missing traces in original data, 10%, 25% and 50%, are used to test the limitation of recovery methods. The joint matrix model (11) with Algorithm 3 is applied to reconstruct the seismic wavefield. Since the spectral projected gradient method only solves an l 1 -regularized vector minimization problem, we decompose the matrix representation system (11) into the l 1 -regularized vector minimization problem. Each column is considered as a subproblem to reconstruct its weight vector separately. Then all the solutions of the subproblems are sequentially aligned into a weighted matrix to evaluate the reconstruction performance. The experimental results on missing percentages 10% and 25% can be seen in Tables 2 and 3.
As for the data without noise but missing 50% traces, the reconstruction performance of joint matrix model with Algorithm 3 is much worse than missing percentages of 10% and 25%. The err rel value is 0.5414 and SNR is around 5.1904dB, almost the same for any p ∈ (0, 1]. These results mean that our method may not completely recover the seismic wavefield well if the missing trace signals are more than 50%. Actually, the sub-sampled data missing 50% itself is a failed collection of seismic recodes. The original shot gathers are shown in Fig. 3(a). The data with 25% traces missing are shown in Fig. 3(b). In forming the under-determined matrix L, a Haar wavelet orthogonal base is used to form the transform matrix Ψ. The unified Algorithm 3 is applied to solve the joint matrix minimization problems (11) with q = 2 and typical parameters p ∈ (0, 1]. Good recovery performance is observed and the result is demonstrated in Fig. 3(c). The error of the original and the recovered data shown in Fig. 3(d) illustrates the efficient recovery performance of joint matrix minimization approach. In displaying the results, the amplitude scale of the error map is the same as  Tables 2 and 3.
Comparatively, the recovery image of the SPG algorithm for the case of 25% traces missing is presented in Fig. 4. Figure 4(a) is the reconstruction and Fig. 4(b) displays the difference between the original and reconstructed seismic signals. It is noticed that SPG algorithm for the l 1 -regularization vector minimization restores the seismic wavefield as accurate as the joint matrix approach with Algorithm 3. These results are obtained using the same code from http://www.cs.ubc.ca/~mpf/spgl1/index.html.
To show the anti-noise property of our algorithm, we add random noise with noise level 0.001 to the simulated data. The unified Algorithm 3 is applied to solve the joint matrix minimization problems. The err rel value, SNR and CPU running time (second) are listed in Table 2 for 3 sparse parameters. The recovery image and the error of the original and the recovered data are shown in Fig. 5(a and b) respectively. The low relative error and high SNR indicate that our algorithm is stable for seismic data restoration.
To save memory requirement of large-scale data, we have observed the restoration behavior of our method on patch of the input synthetic data. We evenly partition the collection of trace signals D into several blocks, such as . Each D g is input separately to recover the seismic signals by system (11). Then all the sub-solutions M g , g = 1, 2, …, f are combined into M = [M 1 , M 2 , …, M f ]. When the number of segments is two or three, the recovered err rel values and SNR are almost the same as the integral case. When each column is considered as a segment, the joint matrix model is reduced to a sequence of vector recoveries, the recovery err rel values and SNR are similar to the integral case but the computational time is around 50 times more.
Reconstruction of seismograms from a heterogeneous model. Next we consider a seismogram generated from a velocity model varying both vertically and transversely (Wang et al. 5 ). The original seismic wavefield, sub-sampled data (37% traces are randomly removed) and recovered data are shown in Fig. 6(a-c), respectively. The difference of the original data and the recovered data is illustrated in Fig. 6(d). In displaying the results, the amplitude scale of the error map is the same as the amplitude scale of the data. It illustrates that all the initial seismic energy is recovered with minor errors. Though the reconstruction is not perfect, most of the details of the wavefield are preserved. Again, to test the quality of our algorithm in seismic data restoration for complex structure, we calculate the signal-to-noise ratio and the relative error. From our calculation, for p = 0.5, the values of SNR and err rel are 26.9792 and 0.0448, respectively; for p = 1, the values of SNR and err rel are 25.6940 and 0.0519, respectively. The high value of SNR and low value of err rel indicate our algorithm works for seismic data restoration even with complex structure.
To show the robustness of our algorithm to interference, we add random noise with level 0.001 and 0.01 to the simulated data respectively. The unified Algorithm 3 with p = 0.5 is applied to solve the joint matrix minimization problems. The values of SNR and err rel for noise level equaling 0.001 are 26.9074 and 0.0451, and for noise level equaling 0.001 are 18.0355 and 0.1254, respectively.
In the noisy case, e.g., noise level equaling 0.01, the frequency information of the original data, sub-sampled data and the recovered data are shown in Fig. 7(a-c), respectively. Again, the aliasing of the sub-sampled data is reduced greatly in the recovered data.
Field data. Finally, we examine the efficiency of the new method with field data. The seismic data is a marine shot gather shown in Fig. 8(a) which consists of 256 traces with spacing 25 m and time sampling interval 2 ms. There are damaged traces in the original gather. The subsampled gather is shown in Fig. 8(b) with 42% of the original traces randomly removed. This sub-sampled gather was used to restore the original gather with suitable solution methods. Again, the unified Algorithm 3 is applied to solve the joint matrix minimization problems (11) with q = 2 and p = 0.5. The recovery result is demonstrated in Fig. 8(c). The error of the original and the recovered data shown in Fig. 8(d) illustrates the efficient recovery performance of joint matrix minimization approach. In displaying the results, the amplitude scale of the error map is the same as the amplitude scale of the data. Comparing the subsampled image with the original image, the restored image can reconstruct most of the details. In addition the damaged trace in the original gather was restored as a good trace. Using the same definition of SNR as above, for p = 0.5, the value of SNR equals 19.7301; for p = 1 the value of SNR equals 19.7919. We only show figures for p = 1, since in visualization the results are similar for p = 0.5.
The frequency information of the original data, sub-sampled data and the recovered data are shown in Fig. 9(a-c), respectively. It indicates that the aliasing of the sub-sampled data is reduced greatly in the recovered data.

Conclusion
Sparse optimization has broad applications in seismic data processing. In this paper we focus on data restoration problem. Noticing that the seismic wavefield can be represented using matrix instead of vector as weight variable to express all the signals simultaneously, in this paper we propose a matrix optimization model to the seismic wavefield recovery. We first reformulate the data restoration problem using an l 2, p -norm constrained matrix minimization model for any p ∈ (0, 1], which is a nonconvex and non-Lipschitz continuous minimization problem. Then we develop a unified algorithm to solve the mixed matrix optimization problem for any p ∈ (0, 1]. Convergence analysis of the new algorithm is also addressed. Numerical results on synthetic problems and the field data example indicate potential usage of our method for practical applications.