A new Egyptian Grid Weighted Mean Temperature (EGWMT) model using hourly ERA5 reanalysis data in GNSS PWV retrieval

Precise modeling of weighted mean temperature (Tm) is essential for Global Navigation Satellite System (GNSS) meteorology. In retrieving precipitable water vapor (PWV) from GNSS, Tm is a crucial parameter for the conversion of zenith wet delay (ZWD) into PWV. In this study, an improved Tm model, named EGWMT, was developed to accurately estimate Tm at any site in Egypt. This new model was established using hourly ERA5 reanalysis data from European Centre for Medium-Range Weather Forecasts (ECMWF) covering the period from 2008 to 2019 with a spatial resolution of 0.25° × 0.25°. The performance of the proposed model was evaluated using two types of data sources, including hourly ERA5 reanalysis data from 2019 to 2022 and radiosonde profiles over a six-year period from 2017 to 2022. The accuracy of the EGWMT model was compared to that of four other models: Bevis, Elhaty, ANN and GGTm-Ts using two statistical quantities, including mean absolute bias (MAB) and root mean square error (RMSE). The results demonstrated that the EGWMT model outperformed the Bevis, Elhaty, ANN and GGTm-Ts models with RMSE improvements of 32.5%, 30.8%, 39% and 48.2%, respectively in the ERA5 data comparison. In comparison with radiosonde data, the EGWMT model achieved RMSE improvements of 22.5%, 34%, 38% and 19.5% against Bevis, Elhaty, ANN and GGTm-Ts models, respectively. In order to determine the significance of differences in means and variances, statistical tests, including t-test and F-test, were conducted. The results confirmed that there were significant differences between the EGWMT model and the four other models.

In this study, a new Egyptian Grid Weighted Mean Temperature (EGWMT) model is developed to accurately compute T m at any location in Egypt.The new model was established using ERA5 reanalysis data from 2008 to 2019, with a spatial resolution of 0.25° × 0.25° and a high temporal resolution of 1 h.The model took into account the vertical lapse rate of T m and calculated it at each grid point.To validate this new model, two types of data sources were utilized.The first type is ERA5 reanalysis data from 2019 to 2022, with grids of 0.25° × 0.25° and a 1-h time interval.The second type of data is radiosonde measurements collected from five radiosonde stations in Egypt over a six-year period from 2017 to 2022.In order to objectively assess the accuracy of the new model, it is compared with Bevis, Elhaty, ANN and GGTm-Ts models under the same conditions.

ERA5 reanalysis products
ERA5 reanalysis products are a collection of global atmospheric reanalysis data provided by the European Centre for Medium-Range Weather Forecasts (ECMWF).These products can give information about various meteorological parameters.The ERA5 reanalysis covers the period from 1979 to the present, and new data is continually added with a delay of 5 days.The data is presented in a gridded format with a high temporal resolution of 1 h and a spatial resolution of 0.25° × 0.25°.The pressure-level products divide the atmosphere into 37 vertical layers based on pressure, ranging from 1,000 to 1 hectopascal (hPa).These products also offer meteorological data for each layer's surface 35 .
In this study, ERA5 reanalysis hourly data on pressure levels are used with a spatial resolution of 0.25° × 0.25° and a high temporal resolution of 1 h.The study area covers the entire region of Egypt, from 22°N to 32°N and from 25°E to 37°E.The ERA5 data can be downloaded from the website: https:// cds.clima te.coper nicus.eu/ cdsapp# !/ home 36 .The ERA5 pressure-level products from 2008 to 2022, including geopotential, temperature, relative humidity, and pressure, are utilized to obtain the T m for modeling and validation purposes.The ERA5 products from 2008 to 2019 are employed to establish the model, while the products from 2020 to 2022 are utilized for validation.

Radiosonde measurements
Radiosonde measurements play a crucial role in assessing other weather observations and model predictions.These data are collected daily at 00:00 and 12:00 UTC, with a time interval of 12 h.Radiosonde measurements provide meteorological profile including pressure, temperature, relative humidity, and wind speed, at specific pressure levels.In this study, the meteorological data from five radiosonde stations in Egypt were collected over a six-year period from 2017 to 2022 to validate the EGWMT model.These radiosonde stations are located in Egypt, as shown in Fig. 1.The radiosonde data is from the Integrated Global Radiosonde Archive (IGRA), which is a newly released radiosonde data set from the National Oceanic and Atmospheric Administration (NOAA) National Climatic Data Centre.This data can be downloaded from the website: https:// ruc.noaa.gov/ raobs/ 37 .

T m calculation
The weighted mean temperature (T m ) is a crucial parameter to calculate the conversion factor (II), which is very important in retrieving perceptible water vapor (PWV) using GNSS technique.The perceptible water vapor and the conversion factor can be calculated and expressed as follows: ( where ρ w is the liquid water density ( ρ w = 10 3 kg/m 3 ), R v is the specific gas constant of water vapor ( R v = 461.5 J/ (k.kg)), and K 2 (22.1 k/hPa) and K 3 ( 3.739 × 10 5 k 2 /hpa ) are the atmospheric refraction constants (e.g. 25 , 38 ).
The value of T m at a specific location can be calculated by numerical integration method.The numerical integration is the most accurate method and easy to be implemented.The formula to calculate T m can be expressed as follows: where e and T are the water vapor pressure (hPa) and the atmospheric temperature (k) along the zenith direction, respectively, h s (m) is the height of the station and h t (m) is the height of the tropopause 39 .h i is the thickness of the atmospheric layer, and e i and T i are the water vapor pressure and temperature at the bottom of the atmospheric layer, respectively., 41 ).
where RH is the relative humidity, e s (hPa) is the saturated vapor pressure, and T d is the atmospheric temperature (℃).
The T m lapse rate is a significant parameter for the vertical adjustment of T m .T m at the reference level can be converted to T m at a target level through vertical adjustment using the concept of T m lapse rate.The linear relationship between T m and height can be used to describe the T m lapse rate (e.g. 9 , 42 ).This relationship can be expressed as follows: where β is the T m vertical lapse rate (K/Km), h denotes the target height (Km), h 0 is the reference height (Km), and T m0 represents the weighted mean temperature at the reference level.This formula is used in this study to calculate the vertical lapse rate at each grid point.

T m Calculated with Empirical Models
The value of T m at any site can be calculated using T m empirical models, which is very important parameter in real-time retrieving GNSS-PWV from GNSS-ZWD.In recent years, many scholars have conducted a lot of research on T m modeling to establish T m empirical models.Most of the previous T m models can be described by the following equation 41 : where T 1 (T s ) represents the T m that calculated from surface meteorological data.T 2 doy , T 3 doy , and T 4 (hod) represent the annual, semiannual, and diurnal T m variation components, respectively.
(2)   1 represent different measurements.ϕ is the latitude, is the longitude, H is the geoid height, H ell is the ellipsoidal height, doy is the day of year, hod is the hour of day, mjd is the modified Julian date and T s is the surface temperature.
Empirical T m models, which only rely on site coordinates and time, are commonly used to estimate global T m values in real-time.However, these models have been found to be less accurate compared to linear regression models 44 .Linear regression is a better option for achieving highly accurate T m estimates.However, most linear regression models are developed for specific regions and are only suitable for local use.Existing global research on T m estimation is either limited to sparse stations or is based on different latitude ranges 18 .Empirical models are unable to accurately estimate global T m values and establish a reliable T m -T s relationship.Additionally, linear regression models are not suitable for real-time applications when temperature sensors are absent at GNSS stations or when in situ surface temperature cannot be transmitted in real time to a data processing center.This limitation can negatively impact the continuous operation of a real-time GPS-PWV remote sensing system 18 .

Model establishment
The new Egyptian Grid Weighted Mean Temperature (EGWMT) model is developed based on a 12-year period from 2008 to 2019, with a 1-h interval and in grids of 0.25° × 0.25° in the region of Egypt.The ERA5 reanalysis products from European Centre for Medium-Range Weather Forecasts (ECMWF), which include geopotential, temperature, relative humidity, and pressure, in 37 pressure levels were utilized to obtain the T m values for establishing the new model.The T m value at each level is calculated as the T m from that level to the topmost level, as shown in Eq. ( 4).
The T m can be interpolated or extrapolated to the surface to the Earth considering the T m lapse rate.To compute the surface values of T m , a linear relationship between T m and height is assumed.The vertical lapse rate of T m is calculated using the data from only the four bottom levels from the ground, which are selected to cover most of the troposphere 44 .
The model coefficients are determined at each grid point throughout the study area.The proposed T m model, EGWMT, can compute the T m value at any site in Egypt using latitude ( ϕ ), longitude ( ), surface temperature ( T s ), day of year ( doy ), and hour of day ( hod ).The equation for this model can be expressed as follows: where α 1 , α 2 , α 3 , α 4 , α 5 , α 6 , α 7 , and α 8 are the model coefficients.These model coefficients are computed at 0.25° × 0.25° considering the spatial variations.At each grid point, the eight coefficients are determined using the least squares adjustment, with known values of T s , doy , hod , and T m .
Figure 2 shows the model coefficients α 1 , α 2 , α 3 , α 4 , α 5 , α 6 , α 7 , and α 8 in grids of 0.25° × 0.25° in the region of Egypt.These eight coefficients are stored in the grid format.It is evident that these coefficients vary depending on the location.The latitude dependent is most evident in the eight coefficients.The α 1 and α 2 are the most essential coefficients.Generally, when α 1 is large, α 2 will be small.The value of weighted mean temperature at any site can be estimated using the coefficients, surface temperature and the observation time.Using the coordinates of the site, the values of the eight coefficients can be estimated and used as inputs in Eq. (13).
Based on the coordinates ( ϕ and ) of a location and the observation time, the four nearest grid points will be selected, and their T m values will be calculated.Bilinear interpolation is then applied to estimate the T m value at the desired location.( 13)

Statistical methods to evaluate the EGWMT model
To evaluate the performance of the newly developed T m model (EGWMT), T m values obtained from both ERA5 reanalysis data and radiosonde stations are selected as references.The two statistical quantities including mean absolute bias (MAB) and root mean square error (RMSE), are used to measure the accuracy of the EGWMT model results.The formulas for these quantities are shown in the following equations (e.g. 31 ).

Test of hypothesis for two sample means (t-test)
The t-test is a statistical test used to compare the means of two samples.In this case, the test has been applied to the results of model validation, specifically comparing the EGWMT model with the four other models.Two types of data sources were used: ERA5 reanalysis data and radiosonde measurements.The two hypotheses, including the null hypothesis ( H 0 ) and the alternative hypothesis ( H a ), are specified as follows: The statistic of the test is: The null hypothesis is rejected where: where µ 1 and µ 2 are the sample means, N1 and N2 are the sizes of the samples, S 2 1 and S 2 2 are the sample variances, and tα / 2 is the tabulated t-value at confidence level 95% (e.g. 45 ).

Test of hypothesis for the ratio of two sample variances (F-test)
The F-test compares the variances of two samples.This test has been implemented on the results of model validation.The null hypothesis ( H 0 ) and the alternative hypothesis ( H a ), are stipulated as follows: The test statistic is: The null hypothesis is rejected in the region: where S 2 1 is the larger variance, S 2 2 is the smaller variance, F is the test statistic value and Fα / 2 is the tabulated F-value at confidence level 95% (e.g. 45 , 46 ).

Performance analysis of the EGWMT model
The ERA5 products from 2008 to 2019 are employed to assess the performance of the EGWMT model, using two statistical methods to measure its accuracy at each grid point.These methods are MAB and RMSE, as shown in Fig. 3.The average value of model MAB is 2.12 K, while the minimum and maximum values of MAB are 1.71 K and 2.60 K, respectively.Whereas, the minimum and maximum values of RMSE are 2.20 K and 3.29 K, respectively, with an average value of 2.71 K.
The average values of vertical T m lapse rates are calculated at all grid points in the region of Egypt, as shown in Fig. 4. It is shown that the average value of T m lapse rate is 6.05 K/Km. the minimum and maximum values of vertical T m lapse rates are 5.35 K/Km and 7.66 K/Km, respectively.

Model validation using ERA5 reanalysis data
In order to objectively assess the performance of the EGWMT model, the T m reference values used for compari- son were derived from ERA5 reanalysis grid data over a three-year period from 2020 to 2022.This new model ( 14) was compared against four other models: Bevis, Elhaty, ANN, and GGTm-Ts.The Elhaty and ANN models are the only available local models for Egypt.The Bevis formula is considered the first proposed T m model and is regarded as the reference model for validating other regional and global models.The GGTm-Ts model is the latest available global model that covers the region of Egypt.Two statistical quantities are used to measure the accuracy of the EGWMT model at each grid point, MAB and RMSE.
The MAB and RMSE values of T m were calculated at each grid point for the five T m models.These values were stored in grids, as shown in Fig. 5.It is clear that the new model (EGWMT) performs well and achieves the smallest MAB and RMSE values.When comparing the four other models, the GGTm-Ts model performs the worst.This may be because the model is a global model and not specifically tailored to the region of Egypt.while the ANN model performs better than the GGTm-Ts model.Both Elhaty model and the Bevis formula achieve better results than the other two models.The Elhaty model slightly outperforms the Bevis formula.
Table 2 indicates the statistical results, including the minimum, maximum and mean values of the five models derived from ERA5 reanalysis data over a three-year period from 2020 to 2022.It shows that for EGWMT model, the average value of MAB is 2.14 K, the minimum and maximum values of MAB are 1.68 and 2.58, respectively.While, the RMSE is 2.70 K in average and ranges from 2.12 to 3.   To determine the significance of the differences between the means of the EGWMT model and the four other models derived from ERA5 reanalysis data, a t-test is conducted.The statistic of the t-test is shown in Eq. 17.The tabulated t-value is 1.96 at a confidence level 95%.The null hypothesis states that there is no significant difference between the T m means for the models.This hypothesis occurs when the values of the t-test are smaller than the tabulated t-value.The alternative hypothesis states that there is a significant difference between the T m means for the models.This hypothesis occurs when the values of the t-test are greater than the tabulated value.When comparing the EGWMT model to each of the four other models, the computed t-value is estimated.The computed values of the t-test when comparing the EGWMT model with Bevis formula, Elhaty, ANN, and GGTm-Ts models are 10.51, 8.20, 13.17 and 17.67, respectively.All of these values are greater than the tabulated t-value, which means that the null hypotheses can be rejected.This indicates that there are significant differences between the EGWMT model and the other models.Therefore, it can be concluded that the EGWMT model has the best performance.
The F-test is implemented to determine the significance of the variances between the EGWMT model and the four other models obtained from ERA5 reanalysis data.The F-test statistics are shown in Eq. 20.The tabulated value of F is 1.43 at a confidence level 95%.The null hypothesis is achieved when the F-test values are smaller than the tabulated value, while the alternative hypothesis is achieved when the values of F-test are larger than the tabulated value.The estimated F-values when comparing the EGWMT model with Bevis formula, Elhaty, ANN, and GGTm-Ts models are 2.20, 2.09, 2.69 and 3.73, respectively.The null hypotheses for all cases are rejected because all F values are greater than the tabulated F-value.This means that there are significant differences between the EGWMT model and the other models.
The seasonal variations in T m were derived from ERA5 reanalysis grid data from 2020 to 2022 in order to assess the seasonal performance of the EGWMT model.The MAB and RMSE values for the five models were calculated at each grid point for the four seasons to assess the EGWMT model and compare it to the four other models.
The bar chart in Fig. 6 demonstrates the average MAB of T m for the five models for each season.The results clearly indicate that the EGWMT model achieved the lowest values of T m MAB for all seasons.On the other hand, the GGTm-Ts model performs the worst, with the largest values of T m MAB.The t-test here is implemented to determine the significance of differences between the means of the EGWMT model and the four other models.The tabulated t-value is 1.96 at a confidence level 95% as shown in Eq. 17.All values of computed t-test, when comparing the EGWMT model with the other models are greater than the tabulated t-value, except in the case of Bevis formula in Winter.In the case of comparing the EGWMT model with Bevis formula in winter, the computed t-test is 0.92, which is smaller than the tabulated t-value.Therefore, the null hypotheses can be rejected for all cases except Bevis formula in Winter.This means that there are significant differences between the EGWMT model and the other models except for the Bevis formula in Winter.
The bar chart in Fig. 7 illustrates the average RMSE for the five models for each season.The EGWMT model consistently demonstrates the best performance, achieving the smallest values of RMSE for all seasons.Here,

Model validation using radiosonde data
The radiosonde measurements at any given site provide meteorological observations, including temperature, pressure, relative humidity, and wind speed at the surface, as well as at various pressure levels.The radiosonde observations from five radiosonde stations in Egypt were used over a six-year period from 2017 to 2022 to assess the EGWMT model and compare it to four other models.The T m values of each radiosonde station over a six-year period were calculated using the integration method and were considered as the reference values to validate the EGWMT model.The performance of the new model is assessed using two statistical quantities: MAB and RMSE.The bar chart in Fig. 8    www.nature.com/scientificreports/ The t-test here is implemented to determine the difference significance between the means of the EGWMT model and the four other models derived from radiosonde measurements.The t-test statistics are shown in Eq. 17.The tabulated value of t at a confidence level 95% is 1.96.When the values of the t-test are smaller than the tabulated t-value, the null hypothesis is accepted.However, when these values are greater than the tabulated value, the null hypothesis is rejected.When the EGWMT model is compared with Bevis formula, Elhaty, ANN, and GGTm-Ts models, the estimated t-test values are 10.37, 30.06, 23.60 and 4.08, respectively.The t-test values for all of these cases are larger than the tabulated value, indicating that the null hypotheses are rejected.This implies that there are significant differences between the EGWMT model and the four other models.
The bar chart in Fig. 9 shows the average RMSE for the five models at each radiosonde station.At most stations, the EGWMT model has the smallest RMSE, and The ANN model has the largest RMSE.The GGTm-Ts model has smaller RMSE than the Bevis formula at most stations.The mean values of RMSE of all radiosonde stations are 3.33 K for Bevis, 3.95 K for Elhaty, 4.39 K for ANN, 3.05 K for GGTm-Ts, and 2.65 K for EGWMT models.Generally, the EGWMT model shows the highest accuracy with the smallest value of mean RMSE, and its ranges are smaller than that of the other models.
In order to determine the significance of the variances between the EGWMT model and the four other models derived from radiosonde measurements, the F-test is conducted.The tabulated F-value is 1.43 at a confidence level 95%.The null hypothesis is accepted when the F-test values are smaller than the tabulated F-value.The estimated values of F-test when comparing the EGWMT model with Bevis formula, Elhaty, ANN, and GGTm-Ts models are 1.58, 2.22, 2.74 and 1.33, respectively.The F-values for all of these cases are greater than the tabulated F-value, indicating that the null hypotheses are rejected, except in the case of the EGWMT model with GGTm-Ts model.In this case, the estimated F-value is smaller than the tabulated value, which means that the null hypothesis can be accepted, and there is no significant difference between the variances of the EGWMT and GGTm-Ts models.

Impact of T m on PWV retrieval
The goal of estimating T m is to convert the ZWD into GNSS-PWV by applying a conversion factor ( II ).Most of the GNSS stations are not equipped with meteorological sensors thus, it is difficult to conduct a comprehensive assessment of the impact of T m on GNSS-PWV.The errors in T m will indirectly affect the accuracy of GNSS- PWV estimation., 44 ).In this study, the impact of T m values from the EGWMT model and the four other models on PWV are analyzed using the same data as utilized in section "Model validation using ERA5 reanalysis data".The commonly used formula of the RMS between T m and PWV can be expressed as: where RMS PWV represents the error of PWV, RMS II is the error of conversion factor ( II ), and RMS T m denotes the error of T m .K 2 (22.1 k/hPa) and K 3 (3.739× 10 5 k 2 /hpa) are the atmospheric refraction constants.RMS PWV /PWV denotes the relative error of PWV 48 .
Table 3 presents the statistical results (Min, Max, and Average) of the errors in PWV (RMS PWV ) and the relative errors in PWV (RMS PWV /PWV ) for the five models.The results clearly indicate that the EGWMT model outperforms the four other models in terms of both RMS PWV and RMS PWV /PWV .The average values of RMS PWV and RMS PWV /PWV obtained from the EGWMT model are 0.16 mm and 0.88%, respectively, which

Figure 1 .
Figure 1.The distribution of radiosonde stations in Egypt.

Figure 2 .
Figure 2. The eight coefficients for the EGWMT model.
24 K.This new model performs well and achieves the lowest values of MAB and RMSE, because the EGWMT model is a regional model specialized for Egypt and may be due to the abundance of data used to establish the model in the Egypt region.It is developed based on hourly ERA5 reanalysis data for 12 years, taking into account temporal variations (day of year and hour of day) and spatial variations (in grids of 0.25° × 0.25°).The GGTm-Ts model performs worst and achieves the largest values of MAB and RMSE.It achieves 4.45 K in average and ranges from 3.57 to 6.10 in term of MAB.In term

Figure 4 .
Figure 4.The average values of vertical T m lapse rates.

Figure 5 .
Figure 5.The distribution of MAB and RMSE for different models tested by ERA5 reanalysis data over a threeyear period from 2020 to 2022.

Figure 6 .
Figure 6.The average MAB of T m derived from the five models for the four seasons.
illustrates the average T m MAB for the five models at each radiosonde station.It is clear that the new model (EGWMT) achieved the lowest values of MAB at most radiosonde stations, except at station 62,306 (Mersa-Matrouh).At this station, the Bevis formula achieved the lowest value of MAB of 4.07 K.The MAB of the GGTm-Ts model are closest to those of the EGWMT model.The performance of GGTm-Ts model is better than that of the Bevis formula in most stations.The mean values of MAB of all radiosonde stations are 4.55 K for Bevis, 6.03 K for Elhaty, 5.71 K for ANN, 4.14 K for GGTm-Ts, and 3.89 K for EGWMT model.The EGWMT model achieves the lowest MAB value.Additionally, the performance of the Bevis formula is better than that of the ANN and Elhaty models.

Figure 7 .
Figure 7.The average RMSE derived from the five models for the four seasons.

Figure 8 .
Figure 8.The average MAB of T m derived from the five models at five radiosonde stations in Egypt.

Figure 9 .
Figure 9.The average RMSE for the five models at five radiosonde stations in Egypt.
Some previous T m models are listed in Table (1).The abbreviations in Table

Table 1 .
Main differences between some previous T m models.

Table 2 .
Statistics of MAB and RMSE for different T m models derived from ERA5 reanalysis data.

Table 3 .
Statistics of errors and relative errors in PWV for different models.