A novel principle to localize the sensitivity of waveform tomography using wave interferences at the observation boundary

When using waveform tomography to perform high-resolution imaging of a medium, it is vital to calculate the sensitivity in order to describe how well a model fits a given set of data and how the sensitivity changes with the spatial distribution of the heterogeneities. The traditional principle behind calculating the sensitivity—for detecting small changes—suffers from an inherent limitation in case other structures, not of interest, are present along the wave propagation path. We propose a novel principle that leads to enhanced localization of the sensitivity of the waveform tomography, without having to know the intermediate structures. This new principle emerges from a boundary integral representation which utilizes wave interferences observed at multiple points. When tested on geophysical acoustic wave data, this new principle leads to much better sensitivity localization and detection of small changes in seismic velocities, which were otherwise impossible. Overcoming the insensitivity to a target area, it offers new possibilities for imaging and monitoring small changes in properties, which is critical in a wide range of disciplines and scales.


A novel principle
Wavefield extrapolation using the boundary integral representation: field data validation. In classical wave theory, the boundary integral representation of wavefields 37 has been explored to relate the response at one place to that at other places. The boundary integral representation has been exploited in the past, for example, in modeling wavefield 38 and in retrieving Green's functions 31 . More recently, the boundary integral representation was utilized in waveform tomography in order to improve the computational efficiency 39 . A specific boundary integral representation can be formulated depending on the assumptions of medium properties in the Green's function and the boundary conditions 40,41 . We use the boundary integral representation in order to extrapolate the wavefield (see "Boundary integral representation"). Here we illustrate how the extrapolation works on field data.
As shown in Fig. 1a, suppose that the scattered waves are observed at multiple locations consisting of a single observation point P and an array of observation points marked as "reference receiver array". Assuming a two-dimensional, scalar Helmholtz equation, we can adapt a boundary integral representation to calculate the response at P by extrapolating waves recorded at the reference receiver array. The vertical array for the reference www.nature.com/scientificreports/ receivers, as shown in Fig. 1a, corresponds to that located in a vertical borehole as we have used in the field experiment. However, this array can be arbitrary in shape. Let us consider an acoustic wavefield where the fluid pressure in a fluid-filled borehole due to an explosive seismic source located at the surface is measured. The measured data are preprocessed such that they represent the 2D wave propagation (see Supplementary Note 1). In this case, assuming that N receivers are located at a spacing of ∆x along the reference array (denoted as x i ), the boundary integral representation can be written in the frequency domain as, where p(P, S) is the extrapolated wave at P, p obs (x i , S) is the observed wave at the reference array, and ∂ 1 G is the horizontal derivative of the Green's function at the location of the vertical array (see "Boundary integral representation" for more details). At first the impulse responses ("model-driven waves" in Fig. 1a), representing responses at the reference receiver array for an impulse at P (Green's function G(x i , P) in Eq. 1), are calculated assuming an acoustic velocity structure which is independent of the heterogeneities present around the source point at S (gray-shaded area in Fig. 1a). This is because the model-driven waves or G(x i , P) do not radiate outward from the considered Dirichlet boundary (also commonly known as sound-soft or free-surface boundary) at the location of the reference receiver array. Then the boundary integral representation performs a temporal convolution between the model-driven waves and the observed waveforms (p obs (x i , S) in Eq. 1) at the reference receiver array ("data-driven waves" in Fig. 1a). This corresponds to the summation of traveltimes for travelpaths from S to the reference receiver array  www.nature.com/scientificreports/ and that from P to the same array. The final response at P ("extrapolated waveform") or p(P, S) is then obtained by collecting and adding contributions coming from all receivers in the reference array (Fig. 1b). A part of the receivers (the area enclosed by the green dashed lines in Fig. 1a,b) in the reference array dominantly contributes to the final response due to the stationary phase, considering the Fresnel volume 42 . This indicates that all wave phenomena (transmission, scattering, and dissipation), which take place earlier than the time of arrival of the wave at the reference receiver array from the source S, are accounted for in a data-driven manner without a model. When we assume a realistic velocity model to calculate the model-driven waves, the extrapolated waveform matches excellently with the directly observed waveform (Fig. 1c). In the field data example shown in Fig. 1, the velocity model is derived from measurements of acoustic velocity at two vertical boreholes.
The result in Fig. 1c clearly shows that the boundary integral representation works very well on the real-world scattered wave data, and that it is possible to extrapolate responses at the observation point P without knowing the model around the source point S.
The localized sensitivity using the adjoint of the boundary integral representation. Assuming negligible extrapolation error, any difference between the extrapolated and the directly observed waveforms is an indication that the assumed model deviates from the true model. Therefore, the sensitivity is defined as a change in the difference between the extrapolated and the observed waveforms due to a change in the assumed model 3,7,8 . In this study, we consider the difference between calculated and observed waveform defined as, where || || 2 denotes the squared L2 norm and the summation takes place over the number of frequencies, observation points, and source points, p and p obs are respectively, calculated waveform using the boundary integral representation and observed waveform. The sensitivity is then defined as the multi-dimensional gradient of the misfit function (Eq. 2) with respect to the model (m), i.e., ∂E/∂m. This sensitivity can be utilized in waveform tomography (see "Waveform inversion"). In order to calculate the sensitivity, we adapt the adjoint-state method 7,43 to our problem, i.e., minimizing the misfit function (Eq. 2) with the wave equation and the boundary integral representation (Eq. 1) as constraints. We first spatially discretize the boundary integral representation (Eq. 1) as, where variables with subscript S indicate that they depend on the source location, those with subscript P indicate that they depend on the receiver location or the observation point P, and the frequency dependence of all variables is omitted for brevity. In Eq. (3), a column vector g P is a solution to the wave equation 44 : where A represents the discretized Helmholtz operator which is a square matrix including the spatial distribution of material properties (acoustic velocity and density), finite-difference operator, and the Dirichlet boundary conditions at the location of the vertical array. A column vector f P includes the discretized Dirac delta function at P. A row vector p S in Eq. (3) approximates the spatial integral in Eq. (1): it includes the multiplication of data at the receiver array p(x, S) and scaling of -1/(jωρ). Here, p S also contains the finite-difference operation (spatial differentiation) applied to the Green's function in Eq. (1). In the case of the vertical array ( Fig. 1), considering that the discretized Green's function g P is zero at the array due to the Dirichlet boundary conditions, p S is the collection vector that collects and adds the values of g P at one spatial grid on the left side of the array, followed by a scaling, as explained above.
The adjoint-state method provides an algorithm which calculates efficiently the gradient 7,43 . To this end, we define the Lagrangian of our problem (Eqs. 2, 3, and 4) as, where the tilde "~" is introduced in order to distinguish between the physical realization and any element in the state and adjoint-state variable spaces, and 1 and 2 are the Lagrange multipliers. A saddle point of the Lagrangian provides the gradient of the misfit function represented by the adjoint-state variables 43 . The saddle point of the Lagrangian can be written as, The real part is taken in the right-hand side term of the above equation because E(m) is real 43 . In the above equation, the adjoint-state variables (λ 1 , λ 2 ) satisfy the following adjoint-state equations: T 2 A(m) = 1 p S , www.nature.com/scientificreports/ where * denotes the complex conjugation. These equations provide an algorithm to calculate the localized sensitivity.
In order to interpret physically the adjoint-state equations, we rearrange the equations such that the sensitivity is the crosscorrelation of two wavefields b and f, where f is a column vector representing the scaled forward propagating wavefield g P (Eq. 4) from the observation point P, where the term ∂A/∂m compensates for the scattering radiation pattern due to different parameterization, and b is a row vector representing the backward propagating wavefield defined as, The backward propagating wavefield b is a solution to the conjugate (time-reversed) wave equation where the source term represents the scattered waves (i.e., difference between calculated and observed waveforms) crosscorrelated with the observed waves at the reference receiver array. It is worth mentioning that the modeling operator A in Eq. (11) is the same as in Eq. (4) where the Dirichlet boundary condition is considered at the vertical array.
Physical interpretation of the novel principle. Let us consider a material which contains multiple scatterers (black circles in Fig. 2a). The measurement geometry is similar to Fig. 1a. Suppose we are interested in finding any heterogeneity between the reference receiver array and the observation point P where there is a local scatterer Q with a smaller scattering strength than the other scatterers. We first extrapolate the waveform at P (p SP in Eq. 3) using the boundary integral representation along with the observed waves at the reference array and assuming homogeneity around P (the medium to the left of the vertical receiver array in Fig. 2a is assumed homogeneous, while the gray-shaded part on the right-located close to the source-does not need any such assumption). Note that arbitrary velocity structure can be assumed in the extrapolation, but here we take a homogeneous medium so that the calculated sensitivity shows a deviation from homogeneity. Since the boundary integral representation is independent of the knowledge of any heterogeneity around the source, any difference between the observed p obs SP and the extrapolated (p SP ) waveforms is caused by the disturbance in the wavefield due to the unknown heterogeneity close to the observation point P or due to the local scatterer Q (red arrows in Fig. 2a), and does not have any effect of the scattering (black arrows in Fig. 2a) that occurs around the source point. This feature, where the difference waveform contains the disturbance only due to the heterogeneity in the target zone between the reference array and P, improves the sensitivity to the local scatterer Q, without the need to know the position and/or the nature of the scatterers located between the source point and the reference receiver array (gray-shaded area in Fig. 2a).
Using the difference waveform, i.e., p obs SP − p SP , we now introduce a novel physical principle based on the boundary integral representation. The application of the adjoint-state method 7,43 to the discretized equations enables us to explain the principle using two wave-propagation thought experiments: forward (Eqs. 4 and 10) and backward (Eq. 11) propagations, as illustrated in Fig. 2b. In the forward propagation experiment, suppose that an impulsive wave propagates from the observation point P to a hypothetical scatterer Q' (i.e., solving Eq. 4 for g P ). The green arrow in Fig. 2b shows the wave propagating forward in time with a traveltime which is larger than zero. Next, in the backward propagation experiment, the difference waveform (red arrows in Fig. 2a) is crosscorrelated with the data-driven waves (observed waveforms at the reference receiver array), which corresponds to evaluating the right-hand side of Eq. (11). The waveform thus obtained is then back-propagated in time from the reference receiver array to the hypothetical scatterer Q' (i.e., by solving Eq. 11 for b). This gives the modeldriven and data-driven waves shown by the blues arrows in Fig. 2b. These arrows illustrate the above-described process, marking backward propagation in time for waves with traveltimes smaller than zero.
Crosscorrelation corresponds to subtraction of traveltimes; therefore the data-driven waves are represented by the blue arrows in Fig. 2b. Consequently, collecting all contributions from the reference receiver array and adding them result in the scattered waves (red arrows in Fig. 2b) which propagate backward from the hypothetical scatterer to the source point, leaving only the forward propagating wave travelling from Q to P (green arrow in Fig. 2b). In this way, by analyzing the amount of correlation between the forward and the backward propagating waves (Eq. 9), a large sensitivity can be achieved at the true scatterer Q in a data-driven manner, without having to know the heterogeneity around the source point.
This novel principle calculates the sensitivity which is highly localized at the location of the scatterer Q, without utilizing any knowledge of the complete heterogeneity distribution (Fig. 2c). In contrast, the sensitivity obtained from the conventional principle, using the same source-observation points, does not allow detecting the local scatterer Q (Fig. 2d). The same conclusion can be drawn even when we use in the calculation all available data (see Supplementary Fig. 1). When we assume that all heterogeneities but Q are perfectly known, the conventional sensitivity approaches the localized sensitivity that we estimate using the new principle (Fig. 2e). In other words, the conventional principle can address small heterogeneities only when the heterogeneities close to the source point are sufficiently known. On the other hand, the new principle, by exploiting the interference www.nature.com/scientificreports/ Figure 2. A novel principle to localize the sensitivity of waveform tomography: schematic illustration and results on synthetic data. (a) "Observed waveform": all waves coming from a source S are observed at an observation point (P) and a reference receiver array. Black arrows show direct and scattered waves which are not associated with the local scatter Q located close to P, and red arrows show scatterings from Q. "Extrapolated waveform": the waveform is extrapolated using the boundary integral representation assuming a material without Q. "Difference waveform": the difference between the observed and the extrapolated waveforms containing scattered waves from Q. Gray-shaded area shows the zone which has no effect on the boundary integral representation.

Results of field test and value of the localized sensitivity
Field test on sensitivity localization. We have tested this new sensitivity localization principle on field experimental data. The measurement geometry is same as in Fig. 1a, but here we have multiple observation points P located along another vertical line (Fig. 3a): in total we observe wavefield using two vertical arrays (left array, LA, and right array, RA). In the field test, this corresponds to measurements in two vertical boreholes. For the sake of clarity, here we need to distinguish between extrapolated and calculated waveforms. Extrapolated waveforms are obtained using the boundary integral representation that we have presented above and assuming homogeneity (constant acoustic velocity). The calculated waveform, on the other hand, are the ones obtained through numerical (finite-difference, FD) computation also assuming homogeneity, where the source wavelet is estimated by deconvolution of the recorded waveform 44 with a waveform that is calculated assuming the same homogeneity and an impulsive source at S. Therefore, the difference in waveforms between the observation (black lines in Fig. 3b) and the calculation or extrapolation (green and red lines in Fig. 3b) indicates the deviation from the homogeneity. The waveforms calculated using the conventional approach by FD method (green lines in Fig. 3b) assume a globally homogeneous material. On the other hand, those using the boundary integral representation (red lines in Fig. 3b) assume local homogeneity between LA and RA, and heterogeneities around the source point (gray-shaded area in Fig. 3a) are accounted for in a data-driven manner. As a result, their waveforms vary over the receiver location (red lines in Fig. 3b), which implies an improved sensitivity to the local heterogeneity.
The sensitivity localization is evident in Fig. 4a. The conventional sensitivity shows large values around the source point S. Further, the conventional sensitivity varies smoothly in the subsurface, which implies small correlation between the incident and the scattered waves (averaging out of the contribution of different local scatterers). This is due to the fact that in the conventional approach, the difference waveforms have a complex nature as they include more scatterers (see black and green lines in Fig. 3b). In contrast, the localized sensitivity, derived from the new principle found in this research, reveals a very detailed structure between LA and RA (Fig. 4a). The sensitivity indicates the amount of velocity perturbation/changes with respect to homogeneity, assuming Born scattering 45 . A comparison with the heterogeneity directly observed at LA (Fig. 4b) shows that the localized sensitivity detects a much finer variation in heterogeneity than the conventional sensitivity at depths greater than 80 m. The novel principle exploits information in the observed data in a completely different manner than the conventional principle. As a result, the conventional sensitivity cannot achieve comparably good results even using all available data, including data from the reference receiver array (Supplementary Fig. 2).
Localized sensitivity: quantitative estimation of heterogeneity. The localized sensitivity can be exploited to resolve quantitatively the material heterogeneity. An inversion scheme can be formulated to estimate the acoustic velocity distribution by minimizing the difference between the observed and the calculated www.nature.com/scientificreports/ waveforms at the observation point. The localized sensitivity can navigate iteratively toward a best-fit model using nonlinear inversion (see "Waveform inversion") without a knowledge of the heterogeneity around the source points.
Using the same geometry as in Fig. 3a, multiple sources are used sequentially in the field to generate pressure waves at right to RA and left to LA (Fig. 5a) in order to illuminate the medium from various directions. The reference receiver array, the observation point (P), and the zone which does not contribute to calculating the localized sensitivity are appropriately defined depending on the source location (Fig. 5a). In order to verify the resolved heterogeneity, we additionally perform independent waveform measurements (ground-truthing) using downhole sources (Fig. 5b) and apply the conventional waveform inversion (Supplementary Note 2).
Waveform inversion estimates a velocity model starting from an initial guess 3,7,8 . We perform standard traveltime tomography to obtain the starting model (Fig. 6a). Waveforms around the first-arriving events and a frequency component similar to that in the independent measurements using downhole sources are analyzed ( Supplementary Notes 3 and 4). Figure 6b shows the estimated velocity structure using the localized sensitivity derived from the novel principle involving the boundary integral representation. Figure 6c shows the result of waveform inversion where additional downhole sources have been placed in RA (ground-truthing). Figure 6d shows in details a comparison between the different velocity models. The estimated velocity using the localized sensitivity is strikingly close to the one obtained from independent waveform inversion using downhole sources and also to acoustic well log data at RA, especially at depths greater than 80 m where the raypath-coverage is good (Fig. 6d). We delve further into this concept through performing realistic synthetic monitoring tests. As we explained in the introduction, this novel principle will be useful when having a physical source at the location of the observation points is not feasible. In our synthetic test we consider the borehole monitoring during fluid injection where the observation points are located in boreholes and the source points only at the surface (Fig. 7a), similar to field experiments discussed earlier in this article.
To generate realistic synthetic data, we assume a random velocity distribution with a mean velocity of 2.0 km/s (Fig. 7a). The data contain effects due to source location errors and those due to temporal changes occurring outside the target area (the dashed rectangle in Fig. 7a). The target area is located at 100 m depth where the velocity decreases by 5% with respect to the baseline measurement due to an increase in the pore pressure 47 . The topmost 6 m is modeled as a vadose zone having a random velocity distribution with a mean value of 1.0 km/s ( Supplementary Fig. 7). Additionally, the structure of the vadose zone is completely different between the baseline and the monitor surveys ( Supplementary Fig. 7), representing a possible drastic change in seismic velocity in this zone due to seasonal change in water saturation 25,48 . Source-receiver geometry and frequency components are quite similar to those in the field experiments discussed earlier (Supplementary Note 6). We consider a realistic  www.nature.com/scientificreports/ scenario where the location of the receivers in the borehole is well calibrated and does not change during the time-lapse measurements (e.g., in the case of permanent monitoring), and the repeatability of the surface seismic sources (portable/mobile sources, e.g., vibrator) is limited due to, for instance, location errors. We assume that the source location in the monitor survey contains random errors up to 4 m ( Supplementary Fig. 7). We first look at the result of imaging the inter-borehole velocity heterogeneities in the baseline measurement. Here we explore two different scenarios for obtaining prior information regarding the non-target zone (outside the two boreholes) to build an initial velocity model for waveform inversion. Assuming the same initial velocity www.nature.com/scientificreports/ model for depths greater than 16 m (Fig. 7b), we consider the situation where the correct average velocity and thickness of the vadose zone are known (Fig. 7c), and in another case where we have a poor knowledge of them and only a smoothed velocity model is available (Fig. 7d). These two different initial velocity models are used to estimate the heterogeneities using the conventional sensitivity (Fig. 7e,f) and the localized sensitivity (Fig. 7g,h), respectively. As the recorded waveforms contain information of the structure present along the wave propagation path connecting the surface source and the downhole receiver, the waveform inversion using the conventional sensitivity estimates the heterogeneities not only in between the boreholes but also outside, i.e., those structures to the left of LA and to the right of RA (Fig. 7e,f)-which are not of interest. More critically, the estimation of the velocity structure in between the borehole has been influenced by the accuracy of prior information of the structures outside the two boreholes, contributing to large uncertainties in the estimated inter-borehole heterogeneities (Fig. 7i-k). On the contrary, the new principle presented in this research addresses the localized sensitivity and, therefore, provides directly the inter-borehole structure (Fig. 7g,h)-which is minimally influenced by the accuracy of the prior information of the non-target zone (Fig. 7k).
In order to achieve accurate results using the conventional sensitivity, it is crucial to account for the propagation effects outside the target zone (gray-shaded area in Fig. 7g,h) by obtaining independently good prior information. Alternatively, one can design carefully a multi-scale inversion scheme utilizing sequentially data from lower to higher frequencies in order to avoid gaps in the wavenumber information 3 . However, this is not a trivial task due to the difficulty in acquiring low-frequency data using controlled sources 3 and because each frequency component has generally a different signal-to-noise ratio. The localized sensitivity is free from uncertainties associated with these fundamental limitations, because the propagation effects outside the target zone are accounted for in a data-driven manner (recall Fig. 2b,c where the propagation effects at the non-target area are taken into account without explicitly knowing the heterogeneity).
Next we concentrate on the monitoring of time-lapse changes in the target zone which is located around 100 m depth (dashed rectangle in Fig. 7a). The results are shown in Fig. 8. The new principle estimates the temporal changes at the target depth much better than the conventional approach (Fig. 8b,c). The conventional approach is sensitive to the source location errors in case an accurate, prior information of the vadose zone is available ( Supplementary Fig. 8). Generally, the conventional approach requires an accurate prior knowledge of the vadose zone (Fig. 7). Any inaccuracy in this prior knowledge results in a significant loss of accuracy in the estimated time-lapse changes at the target zone in this case (Fig. 8c,e). On the contrary, the extremely high sensitivity of the new approach to the inter-well structures allows high-resolution estimation of the velocity changes, which is nearly independent of the presence of any source location error and/or inaccuracy in the prior information of the vadose zone (Fig. 8b,d, Supplementary Fig. 8).

Discussion
In this article, we present a novel principle to localize the sensitivity of waveform tomography to medium heterogeneities, which is otherwise difficult to obtain in case of using the conventional principle for sensitivity estimation. As we explained in the introduction, earlier studies on Green's function retrieval 9,27-33 have tackled a similar problem from a different point of view. Although there is a good similarity between Green's function retrieval and the concept presented here, there are also notable differences. The major difference is that our primary purpose is to directly obtain the sensitivity using the adjoint of the boundary integral representation and, therefore, the associated Green's functions are by-products. This enables us to tackle the problem from a completely different point of view. We have used the Dirichlet boundary condition in the Green's functions in a rather unconventional manner (see "Boundary integral representation"). This makes it possible to relax the critical assumption of one-way wavefield propagation in the Green's function retrieval using convolution 32,33 and that of multi-component measurements or single-component measurements with approximations 31,41 . The assumption of one-way wavefield and/or the approximations due to the conventional boundary condition are otherwise necessary when the primary purpose is to retrieve Green's functions, which will require further processing.
We have formulated the new principle as a 2D problem. We have shown in this article that the assumption of 2D wave propagation is effective for field data. Also, the geometry of the reference receiver array and the observation point can be arbitrary. In this regard, a similar concept, but using a conventional integral representation for Green's function retrieval, was proposed earlier for reflected waves where the reference receiver array and the observation points are co-located in a horizontal borehole 49 . Also, this newly found principle can be applied to seismological monitoring using surface-waves and 2D seismometer arrays because single-mode surface waves in 3D elastic media can be represented by 2D wave propagation at the surface 50 . The independence of the estimated localized sensitivity from source locations and heterogeneities around the source points is attractive for ambient noise tomography, where the limitation due to uneven distribution of noise sources and due to heterogeneities outside the target area is especially detrimental to imaging and monitoring 51 . The novel principle can also be extended to 2D P-SV and 3D wave propagation problems. In the former, one replaces the acoustic boundary integral representation by that of an elastic field 40 , which will require vector-field measurements (e.g., vertical and horizontal displacements) at a reference receiver array. In the latter case, one needs to measure waves at the reference array located over a 2D surface.
The novel principle provides a unique opportunity in case the wave source does not illuminate a medium from a location which is close to the target area, but multiple observation points are used to enhance the localization of the sensitivity without the need to know precisely the structures outside the target area. We have illustrated that this principle is especially useful in monitoring, where the subsurface is illuminated by distant sources and the response is observed by embedded sensors. In other disciplines, this may necessitate a new data-acquisition design. In this regard, the development of fiber optic sensing has lately demonstrated that existing telecommunication networks can turn into spatially dense, subsurface (i.e., pseudo-borehole) acoustic receivers without a need www.nature.com/scientificreports/ of additional sensor installation 10,52 . This can be exploited to resolve/monitor a structure between the subsurface fibre cables only using remote seismic sources (e.g., those at the surface). The novel principle can, therefore, be powerful in future seismic monitoring in areas with difficult access, e.g., in urban or underwater environments. We anticipate that this principle will open up possibilities for new experiments and measurement techniques where accurate and efficient monitoring is of high importance but the conventional approaches using scattered waves are hindered by the insensitivity to the target area due to limitations in data-acquisition geometry or a poor knowledge about changes occurring outside the target zone.

Methods
Boundary integral representation. The following boundary integral representation is used to calculate the waveform at the observation point at P due to the source point at S using interferences of the observed waveforms at the reference receiver array: where all properties are in the space-frequency domain, ω is the angular frequency, j the imaginary unit, ρ the density, ds the line element, and n i the outward pointing normal vector on the arbitrary integral path ∂D, which Estimated temporal changes using the localized sensitivity for waveform tomography (this research). In this case, the initial velocity model contains inaccurate prior information regarding the vadose zone (Fig. 7d). Results using data without source location errors (δs = 0) and with source location errors (|δs| < 4 m) are shown. (c) Same as (b) but using the conventional sensitivity. (d) Estimated velocity changes in the target area at x = 0 m using the localized sensitivity, and (e) the same using the conventional sensitivity. www.nature.com/scientificreports/ is the location of a reference receiver array. Note that in actual data processing, we obtain frequency-domain signals by Fourier-transforming time-domain signals with an additional damping function (Supplementary Notes 3 and 4). Equation (12) indicates that the multiplication of the observed waveform, p(x, S) where x ∈ ∂D, at the receiver in the reference array and the spatial derivative of the Green's function, ∂ i G(x, P), in the i direction due to a point source at P, and the collection of its contributions from all receivers calculate the observed waveform at P. We derive Eq. (12) from the general wavefield representation 40 where arbitrary velocity structure and boundary conditions can be assigned for the Green's function. Here, we define the Green's function such that the velocity structure is same as that of the observed data, but with the Dirichlet (sound-soft) boundary condition at ∂D. This additional boundary condition correctly handles the outward propagating waves at the reference receiver array by canceling non-physical wave arrivals while evaluating the integral. Furthermore, it enables us to require only single component wavefield measurements (e.g., pressure field instead of pressure and particle velocity fields), and the approximations due to single component measurements 31,41 are not necessary. This contrasts with other similar techniques of wavefield retrieval 32,33 . Furthermore, the boundary condition enables us to use model information only inside the reference receiver array because waves in the impulse responses (Green's function) do not radiate outward from the boundary. The array shape (∂D) in Eq. (12) is arbitrary. We consider the special case of a vertical line (Fig. 1a). Suppose that a source S is located to the right of the reference receiver array, and the observation point P is located to the left of the reference receiver array. In this configuration, Eq. (12) can be written as where p obs is the observed waveform at the reference receiver array, G is the Green's function with the Dirichlet boundary condition at the horizontal location of the reference receiver array (Fig. 1a), and we use the relation (n 1 , n 2 ) = (1, 0). Equation (13) can be derived from Eq. (12) using the same approach as that well known in the study of the Green's function retrieval 53 where the closed contour ∂D is divided into the two parts (a vertical line and a half-circle), and we let the radius of the half-circle go to infinity, which leaves only the integral along the vertical line. Equation (13) is accurate, i.e., no approximation is involved except for the constant density along ∂D, as long as ∂D is infinitely long and located between P and S. In a practical situation where the vertical line is finite, Eq. (13) is useful to represent waves that travel from S to P passing through the vertical line. More details of the use of the finite array can be found in the studies of the Green's function retrieval using boreholes 32,33 . Field experiment. The test site is made of sedimentary layers. Two instrumented vertical boreholes with 50 m horizontal separation are available. Hydrophone strings, installed in 28-170 m depth range with 2 m separation between two adjacent hydrophones, are used to measure the pressure wavefield due to a surface source, simultaneously in the two boreholes. We use a small amount (6 g) of explosives as surface sources. The measurement-depth interval is split into four sections. The receiver arrays (hydrophone strings) are positioned simultaneously at one of the sections in each borehole; they measure the pressure wavefield due to the surface source. In order to cover the measurement-depth interval, we repeat this procedure four times at the fixed source location changing the depth of the receiver arrays. The total record length is 0.4 s with a sampling interval of 0.25 ms.

Waveform inversion.
We use the quasi-Newton l-BFGS method 54,55 in estimating the velocity model through waveform inversion. The model parameter is iteratively updated using the following formula: where Q k is the approximate Hessian inverse computed using previous values of the gradient, and α k is the step length in the line-search in the descent direction.

Data availability
The datasets generated and/or analysed in the current study are available from the corresponding author on reasonable request.