Seafloor crustal deformation data along the subduction zones around Japan obtained by GNSS-A observations

Crustal deformation data obtained by geodetic observation networks are foundations in the fields of geodesy and seismology. These data are essential for understanding plate motion and earthquake sources and for simulating earthquake and tsunami scenarios. Although relatively scarce, seafloor geodetic data are particularly important for monitoring the behaviour of undersea interplate boundary regions. Since the mid-1990s, we have been developing the combined Global Navigation Satellite System-Acoustic ranging (GNSS-A) technique for realizing seafloor geodesy. This technique allows us to collect time series of seafloor crustal deformation. Our published data can be used to investigate several seismological phenomena along the subduction zones around Japan, namely the Nankai Trough, Sagami Trough and Japan Trench. These regions are globally important places in geodesy and seismology and are also suitable for comparison with other geophysical datasets. Our intention is for these data to promote further understanding of megathrust zones.


Background & Summary
The Japanese Islands are located in a tectonically active region where multiple tectonic plates interact with each other. Interplate megathrust earthquakes have occurred many times along the undersea plate subduction zone, causing serious damage to human society. To assess future hazards, it is important to elucidate the physical mechanisms of such earthquakes. Crustal deformation data derived from geodetic observation networks, e.g., Global Navigation Satellite System (GNSS) and Interferometric Synthetic-Aperture Radar (InSAR), are extremely useful for revealing slip-deficit rate distributions along the megathrust interface and the pre-, co-and post-seismic events. In Japan, the Geospatial Information Authority of Japan (GSI) has established a dense GNSS observation network called GEONET (GNSS Earth Observation Network System) 1 . GEONET has detected interplate earthquake cycles and various slow slip events occurring on interplate boundaries 2,3 . However, despite the fact that interplate megathrust earthquakes occur in undersea subduction zones, most crustal deformation data to date have been collected by onshore networks only. This lack of data from marine regions limits the estimation resolution and reliability for undersea interplate slip and slipdeficit. Therefore, there is increasing demand for seafloor crustal deformation data with which to elucidate interplate megathrust earthquakes [4][5][6] .
Satellite geodesy using electromagnetic waves (e.g. GNSS, Very-Long-Baseline Interferometry (VLBI) and Satellite Laser Ranging (SLR)) enables absolute onshore position to be determined precisely. However, electromagnetic waves cannot be used to determine absolute seafloor position (SP) because of scattering and absorption of signals by seawater. As proposed originally by ref. 7, the combined GNSS-Acoustic ranging (GNSS-A) technique determines SP through a combination of radio-wave GNSS data from above the sea and acoustic-wave ranging data from under the sea. Since the mid-1990s, the Hydrographic and Oceanographic Department of the Japan Coast Guard (JHOD) has been developing a seafloor geodetic observation system based on this technique 8 and has been constructing a seafloor geodetic observation network along the Nankai Trough, Sagami Trough and Japan Trench. The location of our seafloor observation site is shown in Fig. 1. The onsite measurements with this system are made aboard a survey vessel, on a campaign basis. Table 1 (available online only) shows the list of campaign epochs used in this work. Upper and lower parts of the table indicate the periods before and after the 2011 Tohoku-oki earthquake, respectively. The data described here are coordinates of SP for each campaign epoch. The time series of SP so obtained represent seafloor crustal deformation such as a steady linear trend, a step-like movement and a smoothly attenuating movement. These are mainly due to phenomena around the plate boundary. For example, a steady trend of the time series mainly reflects inter-seismic strain accumulation deformation due to subduction of the oceanic plate. A step-like change would be due to a co-seismic deformation. An attenuating movement following the step-like change would be due to a post-seismic deformation. These data are strongly meaningful for understanding subduction zones and the processes of megathrust earthquakes. Co-seismic deformations due to some megathrust earthquakes have been recorded in these data. The 2005 Off-Miyagi Prefecture Earthquake (Mw 7.2) and the 2011 Tohoku-oki Earthquake (Mw 9.0) were clearly recorded by our observation network 9,10 . Their post-seismic deformations were also recorded 11,12 . The data following the 2005 Off-Miyagi Prefecture Earthquake also include a slip-deficit process preceding the Tohoku-oki Earthquake 11,13 . In this way, the behaviours of earthquake cycles along the Japan Trench were recorded collectively just above the source regions. These data have been used to discuss the complicated processes involved in the earthquake cycles [13][14][15][16][17] and could also be used for simulations to predict future subduction processes.
Ref. 18 suggested that our network had recorded the inter-seismic coupling process along the Sagami Trough. Ref. 19 showed inter-seismic velocity fields in the Nankai Trough megathrust zone and calculated a slip-deficit rate distribution that suggested the interplate coupling condition. These data and results could also be used to discuss and simulate earthquake cycles, future tsunamis and future compound disasters.
In this paper, we present a body of time-series data accumulated since 2001 with description of our observation method and data processing. These data will further the understanding of not only megathrust earthquakes and earthquake disasters but also other geoscience regions. Figure 2 shows the GNSS-A seafloor geodetic observation system. It consists of a seafloor unit with multiple acoustic mirror transponders and a sea surface unit with an acoustic transducer, a GNSS antenna-receiver and a dynamic motion sensor. The sea surface equipment is set on a surface vehicle such as a survey vessel or a buoy; our group has been using a survey vessel only.

Observation System
The position of the GNSS antenna is determined continuously at a sampling rate of 2 Hz by baseline kinematic GNSS analysis using "IT" (Interferometric Translocation) software 20,21 . The positions of the onshore GEONET reference stations used in the baseline analysis are determined by the daily coordinates of the GEONET F3 solutions 22 . In addition to the GNSS observations, the dynamic motion sensor records the attitude of the survey vessel at a sampling rate of 50 Hz. By combining the position of the GNSS antenna and the attitude of the survey vessel, the position of the underwater acoustic transducer according to the coordinate system of the GEONET F3 solutions is determined. The distance between the sea surface transducer and each seafloor transponder is determined by acoustic ranging. The transponder receives an acoustic signal from the sea surface transducer and sends it back. The round-trip travel time is determined by cross correlation using "sas" software 23 . In each observation epoch, we acquire acoustic travel time data from several hundred shots for each transponder. In the data processing described in the next subsection, the absolute SP is determined by combining the data regarding absolute sea surface transducer position and acoustic travel time.
In this process, each travel time is transformed into a distance between the sea surface and seafloor units using an ocean sound speed structure (SSS) model. To construct the SSS model, sound speed profiles are acquired every few hours using temperature and salinity profilers, namely a  conductivity-temperature-depth (CTD) profiler, an expendable conductivity-temperature-depth (XCTD) profiler and expendable bathythermographs (XBT). Before July 2002, the sea surface system used a rigid 8-m-long aluminium alloy pole mounted at the stern of the vessel with a GNSS antenna at the top and an acoustic transducer at the bottom, as shown in Fig. 2a. With the pole system, the measurements were possible only when the vessel was drifting in order to avoid noise from the propellers, but they were affected by deformation of the pole by water flow. Because the vessel was drifting, the track lines were uncontrollable with this system; ideally, track lines should be spatially well balanced for improved accuracy. This situation is similar to the dilution of precision (DOP) of GNSS positioning due to bad satellite geometry. In addition, it requires time for the transfer of the vessel between track lines. Because human operations on the deck are necessary for lifting up and down the acoustic transducer for the vessel transfer, we could not conduct observations at night for safety reasons. For these reasons, it took several days to obtain the empirically required number of data. Differences in the equipment at each period were written in the data file as the equipment code. For the data from this period, the equipment code written in the data file is 'A'.
In July 2002, the sea surface system began using a more-rigid stainless steel pole to reduce the effects of pole deformation due to water flow. For the data from this period, the equipment code is 'B'.
In April 2008, the sea surface system was mounted on the vessel permanently. An acoustic transducer was mounted on the hull of the vessel together with a GNSS antenna mounted on the top of the main mast 24 , as shown in Fig. 2b. With the hull-mounted system, measurements are conducted while the vessel sails along ideally well-balanced track lines at a speed of 5-8 knots. With this rigging, the measurement accuracy was improved. In addition, measurements are conducted continuously without being interrupted by the vessel transfer between track lines and human operations on the deck are no longer necessary. Consequently, the observation time was decreased from 2-4 days (when drifting) to 16-24 h (when sailing). However, because multiple vessels were used after this period, there are differences in instrument errors between the vessels. For the data from this period, the equipment code is 'C'. The history of these changes is listed in Table 2.

Data Processing
The SP is determined using linearized inversion based on the least-squares formulation, combining the obtained data. This inversion software 'SGOBS version 3.6.3' is essentially that constructed by ref. 25.
The basic construction of this inversion can be represented using the general formulation of an observation equation using an n-dimensional observation vector Y and an m-dimensional true model parameter vector X: where e is the observation error vector that varies around zero in a Gaussian distribution characterized by the variance-covariance matrix E. To estimate via linearized inversion, this usually nonlinear relationship should be transformed to a linear formulation. Using small perturbations x around initial value x 0 , Equation (1) can be replaced with When x is sufficiently small, the right-hand side of Equation (2) can be rewritten as a Taylor expansion: By ignoring terms of order x 2 and higher, we obtain the linear observation equation where y Yf ðx 0 Þ stands for the residuals between observation value and that calculated from the initial condition and A is the n × m matrix  The most probablex is obtained by least squares as follows: We obtain the final solution by using an iterative numerical calculation in which x 0 is modified slightly. The variance-covariance matrix C for the resultantx is given as To estimate the SP accurately, the SSS must be sufficiently accurate. Because of the complex spatial and temporal variations of the SSS, it is impossible in practice to make enough observations to cover all these variations in detail. Thus, for centimetre-level positioning, it is not sufficient to use an observed sound speed data only. The observed travel-time data include not only information about the range but also about the SSS along the ray path. Taking advantage of this fact, we estimate an SSS correction function. This idea is similar to estimating the atmospheric delay in a precise GNSS data analysis. The basic flow of our inversion strategy consists of two parts. First, inversion is performed to determine the SPs using a certain SSS. Then, using the resultant SPs, an SSS correction function is estimated. We iterate this process until the SPs converge (see Fig. 3).
Inversion for SP. When we estimate the SP, the model parameter x is given by three components of small corrections to the initial SP coordinates as follows: The observation vector y is given by where q is the total number of travel-time data and ΔT i is the ith residual between the observed travel time and that calculated from the initial position of seafloor transponder and observed position of sea surface transducer. The travel time is calculated via ray tracing using a given SSS. At each site, seafloor unit usually consists of four transponders array that are set approximately on a circle of radius equal to the depth. If each transponder's positions are determined independently, the array geometry at each epoch changes due to the various error sources at each epoch. However, if there was no large earthquake and only a slip-deficit process and an aseismic slip process occurred on a plate boundary, a deformation field is same in a narrow seafloor area, so the array geometry of the seafloor transponders (in a 1-3-km square) is not distorted in about one or two decades generally.
For highly accurate estimation, each epoch's displacements are determined under the assumption that the array geometry is not distorted and array geometry is estimated using all the datasets. This method was developed by ref. 26. To determine the array geometry using all the datasets, the observation equation is represented as follows: Array is replaced at periodic intervals due to battery restrictions. Since simultaneous parallel observation among old and new transponders is performed after replacing, the same reference point can be decided from either old or new transponder group as shown in Fig. 4. The provided position is given as the time-series of the same reference point which is a centre of all the transponders installed in the past.
Inversion to correct sound-speed structure. The SSS correction is represented by a certain function of time. The correction function ΔV(t) can be represented by a linear combination of basis functions Φ i t ð Þ: where a k is the coefficient of the kth basis function. The model parameter vector x in Equation (1) is given by the coefficients of the basis functions as follows: To obtain the geometric path of an acoustic wave, we used ray tracing under the horizontally layered SSS. In general, the sound speed varies with each layer: the near-surface layer is more variable than the abyssal one, for example, so the correction function must be estimated for each layer. However, the geometric path of an acoustic wave is almost a straight line because of the smallness of the refractive index. For this reason, the positioning result depends mainly on the average sound speed of all layers. In this analysis, ΔV is not set for each layer but is set uniformly for all the layers. This process removes the effects of those SSS disturbances that contain long-period components (e.g. daily variation) and short-period components. In many cases, the short-period components include spatial variations due to the vessel's movements.

Code availability
The dataset described here was generated using three computer codes introduced above. The computer code 'IT' (Fortran) was provided by O.L. Colombo. The computer codes 'sas' (C and python) and 'SGOBS version 3.6.3' (Fortran) can be provided upon request.

Data Records
Our published data are in the form of a time series of the centre position of the seafloor transponder array derived from the above processing. Because we use the GEONET F3 solutions as reference positions in our GNSS analyses, the coordinates of our data are consistent with the International Terrestrial Reference  The data file also includes metadata, a list of which is given in Table 3. Date of updating, site name, reference coordinate system and each observation record are written in the file. Each observation record consists of observation start date, absolute positions [XYZ coordinates], equipment and vessel codes and observation end date. Since one observation may span multiple days, observation start and end days are written. The name of the survey vessel used in each epoch is also written in the data file to distinguish instrument errors, as described in the next section. Four vessels have been used. Codes T, S, M and K are allocated to survey vessels of Takuyo, Shoyo, Meiyo and Kaiyo, respectively. The data files are distributed in text format via PANGAEA (Data Citation 1). End of the file name is '_xyz.txt'.
To see crustal deformations, it is useful to transform the ECEF coordinates into local East, North and Up (ENU) coordinates. We also provide the ENU data. The time series shows the relative movements from the first epoch in ENU direction. End of the file name is '_enu.txt'. Metadata in this file is written in Table 4.    Table 4. Published ENU data format including metadata.   The site along the Japan Trench had greatly moved before and after the 2011 Tohoku-oki earthquake 10 . The array shape had changed and at the same time some transponders became unusable. Therefore, the data files are separated at the period of March 2011. Provided Tohoku-oki earthquake co-seismic movement data of the sites along the Japan Trench were calculated using only transponders that can be compared before and after the earthquake in the same way as ref. 10 ('Tohoku_EQ_ Movements_enu/xyz.txt', Data Citation 1). Site names, periods, vessel codes and movements consistent with ITRF2005 are written in these data. As above, since the time series before and after the Tohoku-oki earthquake were not observed using the same array, the difference between these time series cannot be calculated using the provided time series data.
Because arrays at the sites along the Nankai and Sagami Troughs were not distorted or broken, the 2011 Tohoku-oki earthquake co-seismic movement data is not provided for these sites. The co-seismic effects due to the earthquake are included in the time series data of these sites.

Technical Validation
The provided data are affected by three main sources of error (Fig. 2c) 28,29 . The first is the influence of the atmosphere, ionosphere and others on the GNSS positioning (GNSS error). The second is due to the vessel equipment, such as the relative position error between the GNSS antenna and the on-board acoustic transducer (G-T error). There are also instrumental errors in a motion sensor and a transducer. In order to distinguish these instrumental errors which should be different for each vessel, identifiers of vessels are allocated for each epoch. The third is an error from the SSS that cannot be removed sufficiently by the data processing (SSS error). These errors have the potential to act not only as random errors (which would be reduced by increasing the amount of acoustic-wave ranging data) but also as systematic errors (which would not be reduced by increasing the amount of acoustic-wave ranging data). The variances of the model parameters as indicated by the diagonal components of C in Equation (7) are considered to reflect the random errors. However, in many cases the calculated variances are much smaller than the scatter of the final SP time series. This indicates that the influence of the systematic errors cannot be ignored but also cannot be evaluated statistically (e.g. random-walk noise in the high-rate GNSS analysis and spatial heterogeneity of the SSS). It is difficult to evaluate individual systematic errors quantitatively. For example, because it is also impossible to know the true position of the vessel on the sea at every moment, we cannot evaluate the deviation of the estimated position from the true one. Therefore, our scheme cannot provide substantive information about the positioning error at each epoch.
Instead, we recommend evaluating the entire data set statistically, for example deriving the average crustal deformation speed by regression analysis. While the uncertainty of each epoch cannot be calculated, as described above, the SP repeatability can be evaluated by deviation from the regression line because a surface of the ground generally deforms linearly. Since the crustal deformation is believed to have been stable for a long time unless large earthquakes occur, we can consider the time-series regression line to be the maximum likelihood estimate. Therefore, any dispersion of the data from the line is deemed an indicator of positioning uncertainty. The uncertainty of the velocity vector derived from the trend of the regression line can be calculated by the ordinary mathematical method of regression analysis.
As a concrete example of time series, eastward movement of site MYGI are shown in Fig. 5. Two large earthquakes occurred near site MYGI during observation period. To avoid co-seismic steps and nonlinear post-seismic movements, linear regression was applied in three periods. Root mean square (RMS) of the residuals from regression line which is an indicator of positioning uncertainty was 2.7 cm, 3.0 cm and 1.5 cm, respectively.
The published data should not be evaluated using only two epochs before and after the event because the dispersion in the data is larger than the assumed crustal deformation in many cases. When considering a co-seismic step, we recommend evaluating the data of several epochs before and after an event statistically. For such evaluation, it should be noted that steady crustal movements cannot be ignored because of the sparse observation frequency. For example, co-seismic movement associated with the 2005 event is obtained from a difference between two regression lines. However, the obtained value (3.1 cm) is almost the same level as RMS. Therefore, according to statistical testing, there were not enough significant differences.
For estimate the positioning uncertainty of our system, we investigated the residuals from the linear regression lines of all the data after 2013. Fig. 6 shows a horizontal scatterplot and histograms for three components that indicate the residuals for all sites. Horizontal scatterplot shows that almost data are not separated by 5 cm from the regression lines. Comparing the histograms, the states of residuals of eastward and northward components are similar in the value of variances and the tail of distribution. On the other hands, the histogram for vertical components has larger variances value and shorter tail. According to the F-test when the significant level is set to 0.05, the hypothesis that the variances of vertical component have equal to ones of horizontal components is rejected, while the hypothesis that the variances of horizontal components are equal is not rejected.
The total observation uncertainty of this technique is estimated empirically as approximately 2 cm (RMS) in the horizontal component for each epoch. The uncertainty in the vertical SP component is greater than that in the horizontal one because we observe only the upper region of the seafloor. This situation is similar to that for GNSS positioning. Additionally, correlation between the vertical SP component and the SSS reduces the accuracy. There is a trade-off between the vertical SP component and www.nature.com/sdata/ SCIENTIFIC DATA | 5:180182 | DOI: 10.1038/sdata.2018.182 the SSS. For example, if the given sound speed is higher than the true value, a deeper SP is estimated without an appreciable increase in the residuals.