Abstract
Deep learning (DL) is a powerful tool for mining features from data, which can theoretically avoid assumptions (e.g., linear events) constraining conventional interpolation methods. Motivated by this and inspired by imagetoimage translation, we applied DL to irregularly and regularly missing data reconstruction with the aim of transforming incomplete data into corresponding complete data. To accomplish this, we established a model architecture with randomly sampled data as input and corresponding complete data as output, which was based on an encoderdecoderstyle UNet convolutional neural network. We carefully prepared the training data using synthetic and field seismic data. We used a meansquarederror loss function and an Adam optimizer to train the network. We displayed the feature maps for a randomly sampled data set going through the trained model with the aim of explaining how the missing data are reconstructed. We benchmarked the method on several typical datasets for irregularly missing data reconstruction, which achieved better performances compared with a peerreviewed Fourier transform interpolation method, verifying the effectiveness, superiority, and generalization capability of our approach. Because regularly missing is a special case of irregularly missing, we successfully applied the model to regularly missing data reconstruction, although it was trained with irregularly sampled data only.
Introduction
Deep learning (DL)^{1} is a branch of machine learning (ML) that addresses the question of how to build computers that intelligently improve through experience^{2}. Recently, DL or ML, in general, enjoyed an explosive growth and showed great promise in various areas, e.g., biology^{3,4}, image reconstruction^{5,6}, and solid earth geoscience^{7}. DL is powerful for mining features or relationships from data, which is invaluable in the context of big data, as it extracts highlevel information from huge volumes of data. Please refer to Goodfellow et al.^{8} for a good textbook of DL. One of the most popular DL technologies is the convolutional neural network (CNN), which is at the core of most stateoftheart DL solutions for numerous tasks^{9}. In recent years, deep CNNs have had stunning successes, surpassing human accuracy for hard problems such as visual recognition^{6}.
In exploration seismology, DL or ML has been widely used in fault detection^{10}, structural interpretation^{11}, inversion^{12}, and data interpolation^{13,14,15}, to name a few. A more tremendous trend of developments has recently come about through the use of DL not for image analysis but for image transformation. In these cases, CNNs are trained to transform one type of image into another. Many geophysical problems can be posed as transforming an input profile into a corresponding output profile (e.g., denoising: transforming noisy data to noisefree data). Inspired by Isola et al.^{16}, where DL is investigated as a generalpurpose solution to imagetoimage translation problems, we apply DL to missing data reconstruction with the aim of transforming an input incomplete data set into a corresponding complete data set, which is an important ongoing research topic in exploration seismology.
Physical (e.g., the presence of obstacles, nopermit areas, and hardware problems with geophones/hydrophones/airguns) and economic constraints lead seismic data to be incomplete or sparsely sampled during data acquisition^{17}. However, many important techniques cannot adequately handle irregular sampling and rely on uniformly and densely sampled, unaliased input data (e.g., 2D/3D surfacerelated multiple elimination, amplitudevariationwithoffset analysis, and reservoir characterization). The performance of multichannel data processing depends heavily on the spatial sampling intervals. Too large an interval leads to aliasing, adversely resulting in poor resolution. Therefore, the missing data should be reconstructed^{18}.
The missing data problem can be classified into two categories: regularly missing and irregularly missing. Regularly missing means the data are equidistantly or periodically missing at a constant rate in uniform grids. Irregularly missing means the data are randomly missing on uniform grids^{19}. Seismic data are often irregularly and sparsely sampled along the spatial coordinates, leading to suboptimal processing and imaging results. This work primarily concentrates on solving the irregularly missing data problem. Solving the regularly missing data reconstruction problem is an unexpected harvest.
Based on a variety of principles and assumptions, important advances have been made in seismic data reconstruction. Some of them addressed interpolating regularly sampled data^{13,14,15,20}, while some of them attacked nonuniformly sampled interpolation^{21}. There are techniques developed for both irregularly and regularly missing data reconstruction^{22}. A complete and detailed discussion of previous publications is beyond the scope of this work. We only review some key viewpoints and literature closely related to the subject of this work. The methods based on classical signalprocessing principles use specific properties of seismic data as a priori information for interpolation. Signalprocessing reconstruction techniques via transforming the data to other domains and predictionerror filtering generally assume that the data are composed of a superposition of a few plane waves. The sparseness, band limitation, and lowrank assumptions also underlie some of these methods.
Naghizadeh and Innanen^{23} addressed seismic data interpolation using a fastgeneralized Fourier transform (FGFT). They utilized the FGFT to identify the spacewavenumber evolution of spatial signals at each temporal frequency, and a leastsquares fitting scheme to retrieve the optimal FGFT coefficients representative of the desired interpolated data. For randomly sampled data, they sought a sparse representation of FGFT coefficients to retrieve the missing pixels. To interpolate the regularly sampled data at a given frequency, they used a mask function derived from the FGFT coefficients of the low frequencies. This makes the FGFT interpolation method a good competitor, which is used for comparison in our work.
Of the multitudinous methods for seismic data interpolation, few take advantage of recent developments in DL or ML. Jia and Ma^{13} proposed a method for reconstructing seismic data from regularly undersampled traces based on a classic ML method of support vector regression. Jia et al.^{14} proposed an intelligent interpolation method for regularly sampled data by Monte Carlo ML. Wang et al.^{15} proposed a DLbased approach for regularly sampled seismic data antialiasing interpolation. Based on CNNs, Wang et al.^{15} designed eightlayer residual networks (ResNets) with a better backpropagation property for interpolation, which extract feature maps of the training data in a nonlinear way. For the methods of Jia and Ma^{13}, Jia et al.^{14}, and Wang et al.^{15}, in the training process, to generate the input of the designed network, calculation of the initial preinterpolation data using a bicubic method is required, which affects the final performance of their methods.
With the DL theory described in Goodfellow et al.^{8}, we applied DL to both irregularly and regularly missing data reconstruction, where we defined intelligent datatodata translation as the task of translating incomplete data into complete data (without preinterpolation). DL allows trainable models composed of multiple layers to learn representations of data with multiple levels of abstraction^{1}. DL is a representationlearning method with multiple levels of representation obtained by composing nonlinear modules, in which each transforms the representation at one level into a representation at a higher, slightly more abstract level^{8}. With the composition of enough such layers, and given sufficient training data, very complicated functions/relations can be learned. Moreover, as a motivation, DL can theoretically avoid some assumptions restricting conventional interpolation methods (e.g., assumptions of linear events, sparsity, band limitation, and lowrank).
The rest of the paper is organized as follows. In the “Methods” section, first, we briefly transcribe some basic DL theory; next, with the incomplete data as the model input and the corresponding complete data as the model output, we elaborate the established model architecture in detail, which is based on an UNet convolutional network^{4}; then, we provide detailed training analysis including the loss function definition, optimizer used, evaluation metrics, training data preparation, and parameter setup. In the “Results” section, to make the results more convincing and to validate the generalization capacity of the trained model, we test the model’s performance using several typical data sets (i.e., a synthetic training data set, a synthetic test data set, a physical modelling data set, the Mobil Viking graben line 12 data set, the F3 data set, a fault data set from the GeoFrame software, and a data set from the North Sea). The trained model is used to accomplish both irregularly and regularly missing data reconstruction. After discussing some practical aspects and extensions of this work, we summarize some concluding remarks.
Methods
Basic theory
For the sake of brevity, we refer the reader to Goodfellow et al.^{8} for a detailed description of the DL terminologies. DL is a computational tool to learn complex motifs from data. DL uses multiple processing layers to discover patterns and structures in the data. Each layer learns features from the data that subsequent layers build on. Nonlinear parameterized processing layers are combined to progressively transform an input X into the desired output Y_{ref}, typically attaining only approximations \({{\bf{Y}}}_{\mathrm{pre}}\). Deep artificial neural networks (ANNs) are blackbox models whose operation is opaque and difficult to interpret. The adjective “deep” refers to the large number of stacked layers required for building a universal function approximator f as follows:
where θ denotes the parameters (including but not limited to, weights W and biases b of the convolution kernels) in the CNNs.
We regard ANNs as bridges connecting the input X and the desired output Y_{ref}. ANNs with a sufficient number of parameters can theoretically approximate any function. For fitting the desired mapping, the model needs to go through a training process, which begins with a random choice for θ. The training process can be considered as an optimization problem composed by finding a set of the internal parameters θ through the minimization of the discrepancy between \({{\bf{Y}}}_{\mathrm{pre}}\) and Y_{ref} (quantified by the loss function ϕ, which will be explained later) for all the samples fed into the model^{12}.
The parameters θ are iteratively updated to minimize the loss using gradient descent and to improve the accuracy of the model prediction. Each layer of the model is differentiable, meaning that it is known how changes in θ cause changes in output values. The backpropagation (BP) algorithm^{24} uses the chain rule to efficiently compute all partial derivatives, or gradients, with just one forward pass through the model followed by a backward pass. The training process is accomplished if the loss function achieves an acceptable level. The optimizer for the loss function is an important requirement for training a model.
Model architecture
There is no hard and fast rule for how many layers are needed to constitute ANNs, but most researchers agree that no less than three are required. Figure 1 shows a schematic of the established model architecture, which belongs to a specific family of neural network (NN) architectures known as UNet, a generic DL solution for various tasks^{4}. Specific to our mission, the model input data include one horizontal (spatial) dimension and one vertical (temporal) dimension. In seismic terminology, N_{h} and N_{w} denote the sampling points of the input data along the time and space axes, respectively. The number of channels N_{c} equals 1 for seismic data. A data sample with N_{c} channels is fed into the model on the top. The model input data X are randomly missing in accordance with a certain percentage (e.g., 40% → 95%) in the space direction. Please note that no preinterpolation process is involved. The model output Y_{ref} (at the bottom) is the corresponding complete data.
The model architecture (Fig. 1) is an encoderdecoderstyle NN solving the missing data reconstruction task endtoend, which is logically composed of a contracting path (upperside, interpreted as an encoder) and a more or less symmetric expanding path (lowerside, interpreted as a decoder). The encoder takes an incomplete data sample as input and gradually calculates feature maps at multiple scales and abstraction levels resulting in a multilevel, multiresolution feature representation. Layers in the decoder successively synthesize the complete data starting at lowresolution feature maps (denoting large scale structures) up to highresolution feature maps (representing fine scale structures)^{4}. Please see Fig. 2 for a better understanding.
The contracting path/encoder follows a typical CNN, consisting of the repeated application of two padded convolutions (padding avoids the loss of border pixels in every convolution), each followed by an activation operation (blackarrow) and a pooling operation (redarrow) with stride two halving the resolution of the resulting feature map. Convolutions directly following downsampling steps double the number of feature maps^{4}. Each step in the expansive path/decoder consists of a bedofnails upsampling of the feature maps by a factor of two (bluearrow) followed by a concatenation (right curly brace) with the copied encoder feature maps at the corresponding resolution (purplearrow), and two convolutions, each followed by an activation. The feature maps from the contracting path are combined with the upsampled output. Successive convolution layers then learn to assemble a more precise output based on this. Skip connections have been shown to help train deeper networks by preventing vanishing gradients^{25}. At the final layer, a 1 × 1 convolution (greenarrow) is utilized to map multiple feature maps to the desired output Y_{ref}.
A widely used nonlinear activation function is the rectified linear unit (ReLU), which returns elementwise \(\max ({\bf{x}},0)\) with x being an input tensor. The ReLU activation was adopted in the constructed model. For the pooling operation, we employed the maxpooling layer. Note that the number of output filters in the convolution (i.e., F_{i,i∈[1, 5]}) increases (e.g., from 64 to 128, 256, 512, and 1024) as we go deep in the model. At each downsampling step, we generally double the number of feature maps.
Table 1 provides a specified model summary. There are fifty layers (including 19 convolution layers). The trainable parameters focus on convolution layers. No trainable parameters exist in the input, ReLU activation, maxpooling, upsampling, and concatenate layers. The trainable parameters in a convolution layer are computed by the following:
where K_{h} × K_{w} is the convolution kernel size, F_{i−1} and F_{i} denote the number of feature maps in the previous and current layers, respectively. +1 means a bias is added. Figure 2 shows the feature maps for a randomly missing data sample going through a trained model. Once trained, it is capable of filling in the gaps in corrupted data by going through encoding and decoding steps.
Loss function
For our problem, we used a meansquarederror (MSE) training loss function, which measures the average of the squares of the errors. Given the reference solution Y_{ref} and the model prediction \({{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}\), the MSE is computed as follows:
where L is the number of elements in Y_{ref}, \({\mathrm{}\cdot \mathrm{}}_{{\rm{F}}{\rm{r}}{\rm{o}}}\) being the Frobenius norm. The MSE loss function is widely used in statistics. Because we will work in a batchwise fashion in the training process, the loss function is composed of a sum of subfunctions evaluated at different minibatches of data. The loss function is accordingly stochastic.
Optimization algorithm
We employed an Adam (derived from adaptive moment estimation) algorithm to optimize our stochastic loss function. Adam is a simple and computationally efficient algorithm for firstorder gradientbased optimization^{26}. Please see algorithm 1 for the pseudocode of the employed Adam algorithm. ϕ(θ) denotes the stochastic scalar loss function, which is differentiable with respect to parameters θ. g_{t} = ▽_{θ}ϕ_{t}(θ) represents the gradient, i.e., the vector of partial derivatives of ϕ_{t} with respect to θ evaluated at time step t. Adam updates exponential moving averages of the gradient m_{t} and the squared gradient v_{t} where the hyperparameters β_{1}, β_{2} ∈ [0, 1) control the exponential decay rates of these moving averages. The moving averages themselves are estimates of the first moment (the mean) and the second moment (the uncentred variance) of the gradient^{26}.
Algorithm 1 shows that Adam only requires firstorder gradients, which is straightforward to implement with little memory requirement. Adam is designed to combine the advantages of two previously popular optimization algorithms: AdaGrad^{27}, which deals with sparse gradients well, and RMSProp, which works well in online and nonstationary objectives^{26}. Adam is based on adaptive estimates of lowerorder moments. Kingma and Ba^{26} analysed the theoretical convergence properties of the Adam algorithm. Adam is a versatile algorithm for DL problems with big data and/or highdimensional parameter spaces. Using large ANNs and data sets, Kingma and Ba^{26} found Adam to be efficient, robust and wellsuited to a wide range of practical nonconvex optimization problems in the DL field.
Evaluation metrics
As mentioned above, MSE is a measure of the quality of a predictor, which is always nonnegative. With our experience in the seismic data reconstruction field, we also utilized the signaltonoise ratio (SNR) as follows:
to assess the result’s quality. Moreover, we used two other metrics that are widely used in the computer vision superresolution field: peak signaltonoise ratio (PSNR) and structural similarity index method (SSIM)^{28}. PSNR is computed in the following way:
where M is the maximum value of elements in Y_{ref}. The SSIM is defined as follows:
where \({{\rm{\mu }}}_{{{\bf{Y}}}_{{\rm{r}}{\rm{e}}{\rm{f}}}}\) and \({{\rm{\mu }}}_{{{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}}\) denote the mean of Y_{ref} and \({{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}\), respectively; \({{\rm{\sigma }}}_{{{\bf{Y}}}_{{\rm{r}}{\rm{e}}{\rm{f}}}}\) and \({{\rm{\sigma }}}_{{{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}}\) represent the variance of Y_{ref} and \({{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}\), respectively; \({{\rm{\sigma }}}_{{{\bf{Y}}}_{{\rm{r}}{\rm{e}}{\rm{f}}}{{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}}\) is covariance between Y_{ref} and \({{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}\). The constant C_{1} is included to avoid instability when \({{\rm{\mu }}}_{{{\bf{Y}}}_{{\rm{r}}{\rm{e}}{\rm{f}}}}^{2}+{{\rm{\mu }}}_{{{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}}^{2}\) is very close to zero. Similarly, the constant C_{2} is developed to avoid instability if \({{\rm{\sigma }}}_{{{\bf{Y}}}_{{\rm{r}}{\rm{e}}{\rm{f}}}}^{2}+{{\rm{\sigma }}}_{{{\bf{Y}}}_{{\rm{p}}{\rm{r}}{\rm{e}}}}^{2}\) is very close to zero.
The metrics MSE, SNR, PSNR, and SSIM can be used in the training and test processes to quantify the performance of the result. The MSE values closer to zero are better. Generally, the higher the SNR, PSNR, and SSIM values are, the better the result.
Training preparation and setup
Training data
The training data are vital for DL. We prepared the training data utilizing not only the synthetic data but also the field seismic data, to let the model learn the features of seismic data by being given as many instances as possible.
The synthetic training data are modelled with a forwardmodelling code^{29} based on some welldesigned earth models (e.g., Fig. 3a). The model in Fig. 3b is used to generate the test data that are completely unseen in the training process. The sources and receivers are equally distributed from 0 to 2550 m with 10 m spacing. The source is shifted from the location of the first shot to the last one. The source and receiver depths are changed for different earth models and experiments. The simulated data include 2048 shots (8 simulations, 256 shots per simulation), and 256 traces per shot. There are 2048 samples along the time axis with a time interval dt of 0.5 ms. Because dt = 0.5 ms is barely used in industry, we revised the data of size 2048 × 256 × 2048 (arranged in the order of time axis, receiver axis, and shot axis) to three data sets: 1024 × 256 × 2048 (dt = 1 ms), 512 × 256 × 2048 (dt = 2 ms), and 256 × 256 × 2048 (dt = 4 ms), composing the synthetic training data. Figure 4 shows a sample.
We exploited the Mobil Viking graben line 12 data set to generate the field training data. This data set is composed of 1001 shot gathers. Each gather is of size 1024 × 120: 1024 rows represent the time domain, sampled every 4 ms; 120 columns are in the spatial domain with 25 m of sampling. We randomly choose 200 shots as the field training data set (see Fig. 5 for a sample).
Before being fed into the model, each shot gather is normalized by dividing the maximum value of the absolute value of the corresponding shot gathers. Consequently, the amplitudes of the model input and output are finally in the range [−1, 1]. To ensure a sufficiently large number of data samples for learning, we work in a patchwise fashion. The shot gathers are divided into small patches with a specified size. The training data in terms of patches is much larger than the number of training shot gathers.
Training setup
There is a tradeoff between the patch size (determining the receptive field) and the model depth. A larger patch size demands more down and upsampling layers, while small patches allow the model to see only local features. In addition, we should select the patch size such that all 2 × 2 downsampling operations can be applied to a layer with an even height and width size. The patch size and the batch size are primarily limited by the graphics processing unit (GPU) memory. To minimize the overhead and make full use of the GPU memory, we prefer a large patch size over a large batch size.
Our available computing resources are summarized as follows: a workstation with Windows 7, two Intel Xeon E52620 processors, 2.10 GHz CPU, 176 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti GPU (11 GB). Our codes are written in Python based on Keras (a Python DL library). After trial and error, the patch size is set as 112 × 112, which allows four times of downsampling operations for the field training shot gather with 120 traces. To overlap adjacent patches, the patchstride is 23 pixels for the synthetic training data sets, and 10 pixels for the field training data set. Patches with a smaller mean absolute value (e.g., ≤0.001) indicate that there are few events located within the patch or these amplitude values are nearly zeros. The patches below a threshold value are removed from the training data. Then, we paid special attention to the firstarrival areas as their samples are fewer in comparison to other areas. As a result, we augment the proportion of the samples belonging to the shallow firstarrival areas to some degree. The training process finally involves 1,132,800 (more than 1 million) patches. We set the batch size as 128. The stepsperepoch is set as 8850, which denotes the total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. The stepsperepoch is typically equal to the number of samples of the training data set (1,132,800) divided by the batch size (128). Figure 6 demonstrates fifty model inputoutput training pairs.
The speed provided by computation of the updates on small batches of data, in parallel, on specialized hardware (e.g., GPU) allows one to fit networks with millions of parameters on data sets with millions of observations. Good default settings of Adam for the tested DL problems are β_{1} = 0.9, β_{2} = 0.999, and ε = 10^{−8}. The learning rate α is initialized at 0.0001, which is a critical parameter. Nevertheless, determining how to obtain optimal values of the learning rate is still an open issue. The number of epochs should be specified to train the model. Too few epochs generate a poor underfitting DL result, and too many epochs waste running time and possibly produce overfitting results. With our experience, 50 epochs obtain sufficiently good results. Moreover, the indexes of the inputoutput pairs are shuffled before starting the next epoch. In the end, to ensure the original available data remain unaltered, the live data from the original input are reinserted into their original positions in the DL result.
Results
In the training process, the model input is restricted in a specified patch size (112 × 112). However, for the model test, the input data need not to be divided into small patches. That is, a profile can be directly fed into the model. One of the most important issues is convergence of the training process. The training log shown in Fig. 7 indicates convergence, which is successively going down with the increasing epoch numbers. For a welltrained model, it should produce reasonable output for new input that is never seen in the training process (aka the model’s generalization capability). We exploit several typical data sets (e.g., those shown in Figures 5, 8–13) to test the generalization capacity. Because we work in a local patchwise fashion in the training process, the complete feature of a shot gather belonging to the training data (e.g., Fig. 4) is not completely seen. Application of the trained model to irregularly missing data reconstruction is first on the agenda; then, regularly missing data reconstruction follows.
Irregularly missing data reconstruction
Examples in Figs. 4, 5, 8 and 9 validate that the trained model can reconstruct irregularly missing data with high accuracy. The model output, seen in Figs. 4 and 8, shows negligible difference between the true data and the DL result. To check the superiority of the new method over conventional methods, we compare it with a peerreviewed FGFT interpolation method^{23}. The code of the FGFT method is opensource. We made no change to the code of the FGFT method. There are few adjustable parameters of the FGFT method. The results of the FGFT method are correspondingly shown in Figs. 5, 9–12. Though the parameter may be not the optimal one, Figs. 5, 9–12 can show the drawbacks of the FGFT method to some degree. By comparing the FGFT interpolation results with those of DL, DL achieves smaller MSE values and higher SNR, PSNR, SSIM values. These results verify the feasibility, effectiveness, superiority, and generalization capacity of the evaluated method.
Comparing Figs. 4, 5, 8 and 9 (prestack data applications) with Figs. 10–12 (poststack data applications) reveals that reduced precision emerges (reflected in the relatively lower SNR values in Figs. 10–12) if the features of the test data are significantly different from those of the training data. Note that the model is trained with prestack data only. The bias increases as the differences increase. Even though the performance decreases with the increasing feature difference between training and test data, the model still generates acceptable results in comparison to the FGFT results (see Figs. 10–12). We think different types of "reliable” data should be added to the training data to further improve the model’s generalization capability.
Regularly missing data reconstruction
Our primitive goal is to accomplish the irregularly missing data reconstruction. We did not realize that the trained model is suitable for the regularly missing case. Excited by successful tests (two of them are shown in Fig. 13), we found that the evaluated framework is also competent in regularly missing data reconstruction. The reason is that regularly missing can be seen as a special case of irregularly missing. The model trained with irregularly sampled data can be applied to regularly missing data reconstruction. However, the model trained with regularly sampled data cannot be applied to irregularly missing data reconstruction.
Discussion
DL is a promising datadriven approach for solving inverse problems and, by extension, data reconstruction tasks. The model as established in this work may have tens to hundreds of millions of trainable parameters (see Table 1, approximately 87 million), giving rise to a large GPU memory requirement. The key computational cost of DL rests in the training process. However, it occurs once up front. The computational cost of model prediction is inexpensive. For example, the prediction of a 1024 × 112 shot gather costs less than 2 s on a computer without using the GPU. Hence, the overall computational cost is efficient.
Although we have concentrated on 2D, our method can be generalized to 3D/5D cases. A generalization to 3D demands substituting the 2D convolution/pooling/upsampling layers with 3D versions, which is supported by numerous DL frameworks (e.g., Keras, TensorFlow, and PyTorch). We are moving towards 3D/5D reconstruction with the hope of obtaining superior results by using more spatial constraints. In this work, we have focused on missing data reconstruction, but the framework presented here also suggests similar potentials of DL in other fields (e.g., superresolution reconstruction of photos and maps, signal processing, and imaging). Once a general model architecture is ready, the same idea can be applied to many problems.
Conclusions
We assessed a deeplearningbased framework for both irregularly and regularly missing data reconstruction, which is aimed at transforming incomplete data into their corresponding complete data. For achieving this goal, we first build a network architecture with the randomly sampled incomplete data as the model input and the corresponding complete data as the model output, which is based on an encoderdecoderstyle endtoend UNet CNN. Then, we use a meansquarederror loss function and an Adam optimization algorithm to train the model. Next, we prepare the training data utilizing both synthetic and field seismic data. We describe the established model architecture, the used loss function, the employed Adam optimization algorithm, the training data and the training setups in detail. We demonstrate the feature maps for a randomly sampled data set going through the trained model, with the aim of trying to explain how the missing data are reconstructed. We test the trained model with several typical data sets for irregularly missing data reconstruction, which achieves better performances compared with the FGFT interpolation method, verifying the feasibility, effectiveness, superiority, and generalization capability of the evaluated framework. Because regularly missing data can be considered as one special case of irregularly missing data, the trained model is also successfully applied to regularly missing data reconstruction. This work supports that DL can avoid some assumptions limiting conventional interpolation methods (e.g., assumptions of linear events, sparseness, and lowrank) and possesses great potential in advanced intelligent applications over traditional techniques.
References
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444, https://doi.org/10.1038/nature14539 (2015).
Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349, 255–260, https://doi.org/10.1126/science.aaa8415 https://science.sciencemag.org/content/349/6245/255.full.pdf (2015).
Sarah, W. Deep learning for biology. Nature 554, 555–557, https://doi.org/10.1038/d4158601802174z (2018).
Falk, T. et al. UNet: deep learning for cell counting, detection, and morphometry. Nature Methods 16, 67–70, https://doi.org/10.1038/s4159201802612 (2019).
Zhu, B., Liu, J. Z., Cauley, S. F., Rosen, B. & Rosen, M. S. Image reconstruction by domaintransform manifold learning. Nature 555, 487–492, https://doi.org/10.1038/nature25988 (2018).
Belthangady, C. & Royer, L. A. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nature Methods 16, 1215–1225, https://doi.org/10.1038/s415920190458z (2019).
Bergen, K. J., Johnson, P. A., de Hoop, M. V. & Beroza, G. C. Machine learning for datadriven discovery in solid Earth geoscience. Science 363, https://doi.org/10.1126/science.aau0323 (2019).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning http://www.deeplearningbook.org (MIT Press, 2016).
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Networks 61, 85–117, 1016/j.neunet.2014.09.003 (2015).
Wu, X., Liang, L., Shi, Y. & Fomel, S. FaultSeg3D: Using synthetic data sets to train an endtoend convolutional neural network for 3D seismic fault segmentation. Geophysics 84, IM35–IM45, https://doi.org/10.1190/geo20180646.1 (2019).
Wang, Z., Di, H., Shafiq, M. A., Alaudah, Y. & AlRegib, G. Successful leveraging of image processing and machine learning in seismic structural interpretation: A review. The Leading Edge 37, 451–461, https://doi.org/10.1190/tle37060451.1 (2018).
Röth, G. & Tarantola, A. Neural networks and inversion of seismic data. Journal of Geophysical Research: Solid Earth 99, 6753–6768, https://doi.org/10.1029/93JB01563, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/93JB01563 (1994).
Jia, Y. & Ma, J. What can machine learning do for seismic data processing? An interpolation application. Geophysics 82, V163–V177, https://doi.org/10.1190/geo20160300.1 (2017).
Jia, Y., Yu, S. & Ma, J. Intelligent interpolation by Monte Carlomachine learning. Geophysics 83, V83–V97, https://doi.org/10.1190/geo20170294.1 (2018).
Wang, B., Zhang, N., Lu, W. & Wang, J. Deeplearningbased seismic data interpolation: A preliminary result. Geophysics 84, V11–V20, https://doi.org/10.1190/geo20170495.1 (2019).
Isola, P., Zhu, J., Zhou, T. & Efros, A. A. Imagetoimage translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967–5976, https://doi.org/10.1109/CVPR.2017.632 (2017).
Abma, R. & Kabir, N. 3D interpolation of irregular data with a POCS algorithm. Geophysics 71, E91–E97, https://doi.org/10.1190/1.2356088 (2006).
Trad, D. Fivedimensional interpolation: Recovering from acquisition constraints. Geophysics 74, V123–V132 (2009).
Ma, J. Threedimensional irregular seismic data reconstruction via lowrank matrix completion. Geophysics 78, V181–V192, https://doi.org/10.1190/geo20120465.1 (2013).
Naghizadeh, M. & Sacchi, M. Multidimensional dealiased Cadzow reconstruction of seismic records. Geophysics 78, A1–A5, https://doi.org/10.1190/geo20120200.1 (2013).
Wang, L. & Wang, Y. A joint matrix minimization approach for seismic wavefield recovery. Scientific Reports 8, 2188 (2018).
Sacchi, M. D., Ulrych, T. J. & Walker, C. J. Interpolation and extrapolation using a highresolution discrete Fourier transform. IEEE Transactions on Signal Processing 46, 31–38 (1998).
Naghizadeh, M. & Innanen, K. A. Seismic data interpolation using a fast generalized Fourier transform. Geophysics 76, V1–V10, https://doi.org/10.1190/1.3511525 (2011).
David, E., Rumelhart, R. J. W. & Geoffrey, E. H. Learning representations by backpropagating errors. Nature 323, 533–536, https://doi.org/10.1038/323533a0 (1986).
He, K., Zhang, X., Ren, S. & Jian, S. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016).
Kingma, D. & Ba, J. Adam: A method for stochastic optimization. Computer Science (2014).
Duchi, J. C., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159 (2011).
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 600–612, https://doi.org/10.1109/TIP.2003.819861 (2004).
Chen, H., Zhou, H., Zhang, Q., Xia, M. & Li, Q. A kspace operatorbased leastsquares staggeredgrid finitedifference method for modeling scalar wave propagation. Geophysics 81, T45–T61, https://doi.org/10.1190/geo20150090.1 (2016).
Chai, X., Wang, S., Wei, J., Li, J. & Yin, H. Reflectivity inversion for attenuated seismic data: Physical modeling and field data experiments. Geophysics 81, T11–T24, https://doi.org/10.1190/geo20150250.1 (2016).
Acknowledgements
This work was supported by the National Natural Science Foundation of China Program (Grant nos. 41774143, 11805166, 41974154) and the Great and Special Project (Grant no. 2016ZX0502600101). We thank Prof. Jianxin Wei at the China University of Petroleum (Beijing) for providing the physical modelling data. We thank the data contributors of the Mobil Viking graben line 12 data set and the F3 data set, which are available at an open data website (https://wiki.seg.org/wiki/Open_data). We thank Prof. Jianwei Ma for providing the data set from the GeoFrame software and a section of the North Sea data set. We thank the authors of the FGFT interpolation method for making their codes openaccess (https://www.crewes.org/ResearchLinks/FreeSoftware/). We thank SMAART JV for the used Sigsbee2B and Pluto 1.5 earth models. We thank the authors of the Marmousi2 model. We acknowledge Dr. Hanming Chen at the China University of Petroleum (Beijing) for providing the used seismic forward modelling codes. We thank Junyong Yu, Jiankun Jing, and Jinhan Zhang at the China University of Geosciences (Wuhan) for help with proofreading.
Author information
Authors and Affiliations
Contributions
X.C., K.L., H.G. and F.L. designed the research. X.C., K.L., H.D. and X.H. implemented the algorithm. F.L., H.D. and X.H. collected and processed the data. X.C., K.L., H.G. and F.L. designed, conducted and analysed the experiments. All authors participated in preparing the manuscript and provided significant input to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chai, X., Gu, H., Li, F. et al. Deep learning for irregularly and regularly missing data reconstruction. Sci Rep 10, 3302 (2020). https://doi.org/10.1038/s4159802059801x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4159802059801x
This article is cited by

Deep neural networkbased spatiotemporal heterogeneous data reconstruction for landslide detection
International Journal of Data Science and Analytics (2024)

Structureaware decoupled imputation network for multivariate time series
Data Mining and Knowledge Discovery (2023)

Improving plane wave ultrasound imaging through realtime beamformation across multiple arrays
Scientific Reports (2022)

Compressed ultrahighspeed singlepixel imaging by swept aggregate patterns
Nature Communications (2022)

Learning of physically significant features from earth observation data: an illustration for crop classification and irrigation scheme detection
Neural Computing and Applications (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.