Quasi zenith satellite system-reflectometry for sea-level measurement and implication of machine learning methodology

The tide gauge measurements from global navigation satellite system reflectometry (GNSS-R) observables are considered to be a promising alternative to the traditional tide gauges in the present days. In the present paper, we deliver a comparative analysis of tide-gauge (TG) measurements retrieved by quasi-zenith satellite system-reflectometry (QZSS-R) and the legacy TG recordings with additional observables from other constellations viz. GPS-R and GLONASS-R. The signal-to-noise ratio data of QZSS (L1, L2, and L5 signals) retrieved at the P109 site of GNSS Earth Observation Network in Japan (37.815° N; 138.281° E; 44.70 m elevation in ellipsoidal height) during 01 October 2019 to 31 December 2019. The results from QZSS observations at L1, L2, and L5 signals show respective correlation coefficients of 0.8712, 0.6998, and 0.8763 with observed TG measurements whereas the corresponding root means square errors were 4.84 cm, 4.26 cm, and 4.24 cm. The QZSS-R signals revealed almost equivalent precise results to that of GPS-R (L1, L2, and L5 signals) and GLONASS-R (L1 and L2 signals). To reconstruct the tidal variability for QZSS-R measurements, a machine learning technique, i.e., kernel extreme learning machine (KELM) is implemented that is based on variational mode decomposition of the parameters. These KELM reconstructed outcomes from QZSS-R L1, L2, and L5 observables provide the respective correlation coefficients of 0.9252, 0.7895, and 0.9146 with TG measurements. The mean errors between the KELM reconstructed outcomes and observed TG measurements for QZSS-R, GPS-R, and GLONASS-R very often lies close to the zero line, confirming that the KELM-based estimates from GNSS-R observations can provide alternative unbiased estimations to the traditional TG measurement. The proposed method seems to be effective, foreseeing a dense tide gauge estimations with the available QZSS-R along with other GNSS-R observables.

www.nature.com/scientificreports/ the time-series are utilized along with KELM for training and validation purposes. The mathematical summary of the KELM approach has been given in "Results and discussions" section. Numerous kinds of literature have focused on investigating the TG measurements using multi-constellation signals, however, the SNR measurement based on Quasi-Zenith Satellite System (QZSS) constellation signal has never been investigated. The QZSS provides the best service over Asia-Pacific region, and it is very important to consider the performance of QZSS-R over an oceanic area located in this region. Therefore, in the present work, the QZSS-based SNR measurement with triple-frequency signals (L1, L2, and L5) are collected. To prove the validity of QZSS-TG, we also used GPS (L1, L2, and L5) and GLONASS (L1 and L2) constellations frequency signals and estimated GPS-TG and GLONASS-TG measurements, then the reflectometry results are compared with GPS and GLONASS constellation systems. A KELM based on VMD is implemented in this study and the QZSS-TG time series are also modeled. Data collection and method modeling are described in "Data collection and method of modeling" section. Results and discussion are addressed in "Results and discussions" section and finally, the conclusions of the study have been written in "Conclusion" section.

Data collection and method of modeling
GNSS data collection. The Geographical Survey Institute (GSI) of Japan established a GNSS network known as GNSS Earth Observation Network (GEONET) is utilized for the current study to examine the oceanic activities in the selected region of Japan. The GEONET consists of more than 1,300 ground stations where their GNSS sites are primarily mounted on the bedrocks for high-precision crustal measurements. Ansari and Bae 25 discussed the sea-level change around the Japanese coast by using six-tide stations by using wavelet analysis. They noticed that only the P109 site showed relative TG velocity (mm/yr) and other sites showed positive relative TG velocity (mm/yr), leading to a sample of a discrepancy between the estimated values. The negative relative velocity at the P109 site indicates subsidence. Hence, in the current study to check the applicability of QZSS-R, the GEONET data is collected from the P109 site (37.815° N; 138.281° E; 44.70 m elevation of ellipsoidal height) located in Sado Island of Japan during 01 October 2019 to 31 December 2019. Then SNR measurements with QZSS satellites are retrieved to investigate the sea-level changes in the Japanese Sea.
According to the theory of the Fresnel zone, the reflected signal received by the GNSS antenna in the first Fresnel zone can obtain the maximum signal energy of the dielectric layer. On one side, the elliptical area of the Fresnel zone is affected by the elevation angle of the satellite and on another side, the oscillation amplitude of SNR data decreases with increasing elevation angle due to the antenna gain pattern i.e., the multipath effect is more apparent at low elevation angles. It is important to choose the range of elevation and azimuth angles to range for the SNR data 26 . Therefore with the spatial domain distribution, we selected, the first Fresnel zones of the P109 site with the azimuth range of 30° to 330° projected on a Google Earth map as shown in Fig. 2. Here, the considered first Fresnel zones are built using the GNSS-R Software as suggested by Roesler and Larson 26 . Further detailed information about the first Fresnel zones can be found in Nievinski et al. 27 . As shown in Fig. 2, the higher elevation angles include the land reflection which will not be useful to determine sea level changes. Hence in this paper, observations are used for the elevation angle within the range of 5°-25° degrees because they are easily affected by the multipath at this station. Furthermore, since the sea level change is focused on in this paper, the signals reflected from the sea surface are used. Through evaluating the map around the station and the reliability of the observations, tracks just between the azimuths of 20° to 80° and 110° to 170° are used to perform the spectral analysis.
The QZSS-TG measurement based on the SNR measurement (dB-Hz) and the prior reflector height (H 0 ) can be calculated as follow 28 : where A is the amplitude that is deviated from zero (unit: m), ϕ is the phase offset, θ is the elevation angle (unit: radian), and λ is the signal wavelength (unit: m). A detailed study of the whole analytical solution can be found in our previous work 2 . Furthermore, the Lomb-Scargle periodogram (LSP) is used to estimate the dominant frequency or f m,t (no unit) as proposed by Press et al. 29 , and the f m,t is related to the effective reflector height (H eff ) with the following equation: www.nature.com/scientificreports/ Here, the unit of reflector height (H eff ) can be shown in meter (m) using the product between the f m,t and the λ.
GSI data collection. There are five common types of tide gauges which are responsible for measuring tide.
The floating mechanical tide gauges, floater of an encoder, pressure sensor, acoustic tide gauge, and radar sensor 30 . The gauge employs a mechanical system, working with a floating unit that is placed in a stilling well connected directly to the sea where the sea water enters into it 31 . At the beginning of the twenty-first century, as a result of the increasing interest in high-quality sea level data, tide gauge networks have been upgraded and traditional mechanical float tide gauges have been progressively substituted by electronic gauges provided with a floater of the encoder, acoustic, pressure and recently radar sensors 32 . The GSI of Japan has installed approximately 25 tide observation stations along their coastal regions to regularly monitor the sea-level changes. The GSI web can be reached via https:// www. gsi. go. jp/ kanshi/ tide_ furni sh. html. The structure of the Japan TG observation station at GSI along the coastal area is shown in Fig. 3, where the sea-level changes are continuously monitored. The GSI-TG measurements are measured in height (unit: m) from the reference plane observations of the value of measured sea levels always have a positive sign. This observed height is called the observed sea level, then they are translated to Tokyo Peil (TP) which is the mean sea level at Tokyo Bay. The GSI-TG measurement is obtained from the same co-located station for the validation objective. The method of calculation of the sea level observation to the TP converted tide level is as follows (https:// www. gsi. go. jp/ kanshi/ tide_ comme nt. html): The datum constant value in Eq. (3) depends upon the location of the GSI station.
(2) f m,t = 2H eff   www.nature.com/scientificreports/ KELM modeling. Let us consider N arbitrary samples of (x i , y i ) where x i ɛ R N and y i ɛ R N (i = 1, 2, …, N), the activation function of the hidden layer is given by h(x), and Y is defined as an output matrix. The single-hidden layer feedforward neural networks (SLFN) can be shown as follows 33 : where α is the output weights between the output layers and the hidden layers, w i = [ω i1 , ω i2 , …, ω iN ] T presents the input weights between the input layers and the ith hidden layers. The number of the hidden nodes and the threshold of the hidden layer is represented by l and a, respectively. The k i are the threshold of the hidden layer. Now, Eq. (4) can be written in another form: Here the output matrix of the hidden layer is represented by H, and α is an unknown parameter estimated by the least square's method. By taking Moore-Penrose generalized inverse matrix, the solution of Eq. (5) can be written as 34 : where H † presents the Moore-Penrose generalized inverse matrix of H 35,36 . Different kinds of methods can be applied to estimate the generalized Moore-Penrose inverse of a matrix such as the orthogonalization method, orthogonal projection method, singular value decomposition method, and iterative method. The method of orthogonal projection can be applied in two distinct cases, first when H T H is non-singular and If we add a decisive penalty factor 1/F in Eq. (6), the resultant solution will be stable and have a tendency for better generalization of performance. The value of α will be given like this: The orthogonal projection method α can be estimated by using the Ridge regression theory 37 . The output function expression for the extreme learning machine (ELM) will be converted as: Now, the hidden layer h(x) activation function can be substituted with the kernel function in terms of Mercer's conditions 38 , and the KELM output function can be given in the following equation 33 : In Eq. (9), the random mapping of the ELM is substituted with the kernel function, hence, the output weights become more stable.
The VMD algorithm establishes a constrained optimization problem based on the assumption that all components are narrowband signals concentrated near their respective center frequencies and decomposes the original signal into intrinsic mode functions (IMFs) with a certain number of layers at different frequencies 39 where X (t) is the original signal, u s (t) is the sth IMF component, r n (t) is the residual term and S is the number of decomposition levels. The IMF is amplitude modulation and frequency modulation signal, written as 40 : www.nature.com/scientificreports/ This is notable, although the superposition of all sub-signals u k is the SNR, the phase φ s of the sth sub-signal is not directly related to φ γ in the SNR. The IMF that is highly consistent with the original SNR is selected as the trend term, and the remaining IMFs can be reconstructed to obtain the oscillation term 41 . The VMD algorithm only replaces polynomial fitting here, and it is still necessary that LSP spectral analysis is used to extract the frequency of the oscillation term. The advantage of the VMD algorithm for fitting the trend term is that it has similar characteristics to the adaptive high-pass filter, which separates high-and low-frequency components to achieve detrend processing 41 .

Results and discussions
QZSS-TG measurement analysis. The interferences between the reflected and the directed QZSS signals were recorded by SNR interferograms. In order to obtain the error in multipath, a low-order polynomial (second degree) is utilized to attain SNR detrended time series 2,9 . After removing the SNR trend, SNR plots (linear scale units of volts/volts) against the elevation angles are studied. Hereafter, the unit "linear scale unit of volts/ volts" of SNR is shown shortly in terms of "volts/volts" throughout this paper. A typical example of J01, J02, J03, and J05 (this is official PRN notation of QZSS satellites like QZS-1, QZS-2, QZS-3 and QZS-4) satellites with triple frequencies of L1, L2, and L5 signals on the DOY 305 of November 2019 has been shown in Fig. 4. Here the P0109 site of GNSS Earth Observation Network in Japan (37.815° N; 138.281° E; 44.70 m elevation in elevation height with reference to ITRF14) consists of standard geodetic quality choke ring zenith pointing antenna TRM29659.00 (antenna height 0.83 cm) with dual-frequency carrier-phase GNSS receiver TPS NETG5 (version 5.1p2) operating at 30 samples per seconds. It is clear from Fig. 4 that the SNR observations are oscillating with the increase of elevation angle due to the multipath errors. If there are no multipath errors, the SNR measurements will smoothly rise with the increase of elevation angles 42,43 . The variation of the SNR plot shows that the strength of multipath signals varies between − 20 to 20 V/volts for L1 and L2 QZSS signals, while they are very low between − 5 to 5 V/volts for the L5 QZSS signal. The latter is caused by the lowest noise and multipath error of the L5 signal 44 . It can also be verified that before and after multipath corrections, the QZSS L5 signal provides higher precision positioning accuracy as compared to the L1 and L2 signals 45,46 . This happens due to the higher value of chipping rates (10.23 Mbps) on the L5 signal and the transmitted higher power which generates larger SNR and better rejection of multipath 47 .
The dominant multipath error frequency can be a result of the SNR data by using the Lomb-Scargle periodogram (LSP) or other spectrum analyses. The LSP method is used to convert the data into the frequency domain as well as to obtain the dominant frequency 25 . The dominant frequency for the SNR measurements used in Fig. 4 is converted to the effective reflector height by applying Eq. (2). Figure 5a-c show the SNR observations of QZSS L1, L2, and L5 signals, multipath patterns, and LSP for J01, J02, J03, and J05 satellites on 01 November 2019 (DOY 305). The frequencies of multipath patterns are affected unequally by multipath errors, so the minute difference in peak height of reflected sea-level changes will be obtained, therefore, the mean or median values of these heights can be considered as the final estimation of reflected height of sea-level changes. It has been observed from Fig. 5a-c that the reflector heights obtained from the L1 signal were 0.67 m, from the L2 signal was 0.59 m, and from the L5 signal was 0.73 m. Hence the mean value of these heights [(0.67 + 0.59 + 0.73)/3 = 0.663 m] will be the final estimation of the reflected height of sea-level changes. More details about the QZSS (L1, L2, and L5) signals including the GPS (L1, L2, and L5) and GLONASS (L1 and L2) will be discussed in the next section (Tables S1, S2, S3).

Comparison with GSI-TG measurements.
Daily sea-level changes (unit: m) obtained from QZSS L1, L2, and L5 SNR observations (namely, QZSS-TG), as well as GSI tide gauge (namely, GSI-TG) observations during the 01 October 2019 to 31 December 2019 (90 days), are plotted in Fig. 6 (Supplemenary file). The QZSS-TG observations for L1, L2, and L5 signals are derived and shown in Fig. 6 with red (L1 signal), blue (L2 signal), and green (L5 signal) color circles, respectively, while the GSI-TG observations are also depicted in Fig. 6 with a black solid line. The observation obtained from TG sea level will be relative to the TG benchmark while derived results of sea level changes from GNSS measurements will be relative to the GNSS stations. Therefore, a mean sea level is achieved by taking the difference between TG results and GNSS-derived observations. Sometimes mean value provided by QZSS-R showed very much different from the mean value provided by TG. This happened because of improper visibility of signals. For example, a signal is visible from 25° to 60°, and we have included elevation only to 30°. It means only data from 25° to 30° will be used for analysis. Such types of signals provide error. Hence, we have ignored them by using some possible limited differences. To care about the above points, first, we roughly estimated the height of the water surface, which lies in the range of 0.5-1.5 m. It is a height filter, and the MATLAB code will print only those values of water surface which lie between this range. Other unexpected heights will be excluded automatically.
The number of reflection points tracked over land will be reflected from the land surface and generated by the code as a zero value. As above we already mentioned the threshold range (0.5 to 1.5 m) hence the reflection on land will be excluded automatically. The number of reflection tracks used for retrieval from the different systems QZSS, GPS, and GLONASS was counted as shown in Table 1. The points which were lies in the range of 0.5-1.5 m were included and the other point beyond of range were excluded ( Table 1). As we can see from the table the total number of points observed by the GPS constellation was high compared to GLONASS and QZSS constellations. This happened because more GPS satellites are visible in this region compared to others. Since the QZSS constellation has only four satellites hence their recorded data points are the lowest. Although the QZSS constellation has the lowest number of observed and included points, still the constellation is able to provide good accuracy of sea-level measurements and can be used for QZSS-reflectometry, which is the main aim of the current study. In Fig. 6 www.nature.com/scientificreports/ and GSI-TG observations are presented by the vertical axis. There is a significant role of daily periodicity in sea level (black solid line) can be seen in Fig. 6, which is possibly caused by wind and tide variability, as well as the variability of sea level pressure 48 . We can also see from   Table 2. It is visible from Table 2 Table 2. In general, the number of GPS and GLONASS satellites are the largest compared to QZSS so, they should have the average highest temporal resolution estimation for sea-level changes. This means www.nature.com/scientificreports/ QZSS-Reflectometry is able to provide a new chance to monitor sea-level changes with the available frequencies and the proposed technique.

Implication of KELM modeling.
To improve the QZSS-R-based measurements and reconstruct the tidal variability, we implicate the KELM modeling method. The number of VMD components varying from 1 to 12 are used to predict the best possible results. Figure 7 shows the correlation coefficients for QZSS (L1, L2, and L5 signals), GPS (L1, L2, and L5 signals), and GLONASS (L1 and L2 signals) after implementing KELM from 1 to 12 VMD components. The vertical lines in Fig. 7 correspond to the correlation coefficients taken from Table 2. It can be seen obviously that the performance of the correlation coefficient is enhanced significantly at a particular high VMD component as compared to the low VMD component. This indicates that before implementing KELM, we need to take care of a proper VMD component. The 12th VMD component showing the best performance of correlation coefficient and RMSE have been presented in Tables 3 and 4, respectively. Moreover, it can be seen from Fig. 7 that KELM modeling for QZSS signals provides a nice improvement in terms of the correlation coefficient, for example, QZSS L1 of 0.9252, QZSS L2 of 0.7895, and QZSS L5 of 0.9146. However, the correlation improvements are not higher than GPS with an average of 0.9816 as well as GLONASS with an average of 0.9877. This indicates that the KELM model works in a proper way, but the nice improvements in QZSS  www.nature.com/scientificreports/ measurement are less because of its high variability. Although the correlation coefficients of QZSS measurement are slightly degraded as compared to those of GPS and GLONASS measurements, the correlation coefficients are increased significantly. For more analysis, we plotted the error histograms between the satellite systems (i.e., QZSS-R, GPS-R, and GLONASS-R) and KELM reconstructed TG results as shown in Fig. 8. It can be seen manifestly that the errors behave approximately normal distributions. The mean errors for QZSS-R, GPS-R, and GLONASS-R are estimated at − 2.0 mm, − 0.13 mm, and 0.02 mm, respectively, and the respective standard deviations are 0.0781 m, 0.0323 m, and 0.0389 m, respectively. Here, the mean errors are very close to zero or equivalent to zero mean, therefore according to the standardized normal distribution the estimated results can be considered as unbiased estimations. Finally, we can infer that KELM reconstructed TG errors are significantly reduced, and their distributions are more reasonable. Because the KELM modeling shows good agreements between the reconstructed and measured tidal variability, KELM is thus suitable to reconstruct the tidal variability for QZSS-R measurements with enhanced correlation coefficients.

Conclusion
In this paper, QZSS SNR data with triple frequencies of L1, L2, and L5 signals were firstly used to estimate sealevel changes at the P109 site of the Japanese GNSS Earth Observation Network (37.815° N; 138.281° E; 44.70 m elevation height) located in Sado Island of Japan. To compare with the tide gauge measurement of QZSS-R, other GNSS-R data with different frequency bands were also used to estimate the TG such as GPS (L1, L2, L5 signals) and GLONASS (L1 and L2 signals). Initial sea-level estimates were obtained through Lomb-Scargle periodogram (LSP) spectral analysis on the detrended SNR time series. A KELM technique based on VMD was applied to the QZSS, GPS, and GLONASS observations to reconstruct their tidal variability. The results show good agreements between observed and KELM-estimated TG obtained from QZSS, GPS, and GLONASS data in terms of correlation coefficients and root mean square errors. The research findings demonstrate that the  www.nature.com/scientificreports/ refined sea-level estimates based on KELM achieve significant accuracies in terms of correlation coefficients and histogram plots of errors. The estimated tide gauges obtained from QZSS measurements are comparable to those of GPS and GLONASS measurements. Furthermore, since wave dynamics of water level near the coastal areas in storm situations are very important to make research and discuss additionally, the future study will focus on proving the KELM modeling with other constellations such as Galileo and BeiDou, as well as including more stations and longer data.