Deep learning downscaled high-resolution daily near surface meteorological datasets over East Asia

U-Net, a deep-learning convolutional neural network, is used to downscale coarse meteorological data. Based on 19 models from the Coupled Model Intercomparison Project Phase 6 and the Multi-Source Weather (MSWX) dataset, bias correction and UNet downscaling approaches are used to develop high resolution dataset over the East Asian region, referred to as Climate Change for East Asia with Bias corrected UNet Dataset (CLIMEA-BCUD). CLIMEA-BCUD provides nine meteorological variables including 2-m air temperature, 2-m daily maximum air temperature, 2-m daily minimum air temperature, precipitation, 10-m wind speed, 2-m relative humidity, 2-m specific humidity, downward shortwave radiation and downward longwave radiation with 0.1° horizontal resolution at daily intervals over the historical period of 1950–2014 and three future scenarios (SSP1-2.6, SSP2-4.5 and SSP5-8.5) of 2015–2100. Validation against MSWX indicates that CLIMEA-BCUD shows reasonable performance in terms of climatology, and it is capable of simulating seasonal cycles and future changes well. It is suggested that CLIMEA-BCUD can promote the application of deep learning in climate research in the areas of climate change, hydrology, etc.

SD establishes statistical relationships between large-scale GCM outputs at coarse resolution and local-scale observations at fine resolution during the training period, and applies the relationships to obtain fine information during the projected period.It is computationally inexpensive and easy to implement 19 .There are different SD approaches, for example, regression, weather classifications, and weather generators.Regression approaches are very popular, such as multi linear regression (MLR) 20 , generalized linear model (GLM) 21 , and machine learning (ML) method including support vector machine (SVM) 22 , random forests (RF) 23 , and artificial neural networks (ANN) 24,25 .Many studies have compared the performance among different regression approaches [26][27][28][29] .
Deep learning (DL) has been proved to be good at capturing complex and abstract features from numerous data 30 .Many studies have applied the DL based super-resolution (SR) approaches for downscaling [31][32][33] .Among the DL approaches, UNet shows superior performance in the field of SR and has been used in statistical downscaling.Sha et al. 34,35 developed new UNet archives, named UNet-AE and Nest-UNet for temperature and precipitation downscaling respectively, and found that the UNet-based models show better performance than spatial disaggregation.Adewoyin et al. 36 applied Temporal Recurrent UNet (TRU-NET) to downscale precipitation, and showed TRU-NET had better performance than a DL model prevalent in precipitation downscaling and dynamical downscaling method.

Data acquisition.
The MSWX gridded high-resolution bias-corrected meteorological dataset is used as observations.Based on ERA5, MSWX produces 10 widely used near-surface meteorological variables with 0.1° horizontal resolution and 3-hour temporal resolution.The study area covers the whole of East Asia from 4.95°N to 60.05°N and 64.75°E to 150.25°E (Fig. 1).In order to construct the bias correction and a UNet downscaling model, the high-resolution MSWX datasets are averaged to coarse resolution at 1.0° × 1.0° as MSWX_LR using the area average method.
For climate change downscaling, we use the CMIP6 data, which provides the latest GCM simulations including voluminous global gridded model data over the historical period of 1950-2014 and four Shared Socioeconomic Pathways (SSPs) scenarios with 2015-2100 period.There are 19 GCM outputs for historical simulations and three representative future scenarios (SSP1-2.6,SSP2-4.5, and SSP5-8.5)(Table 2).As shown in Table 2, the original CMIP6 GCMs outputs have coarse spacing resolution.All CMIP6 data can be downloaded at https://esgf-node.llnl.gov/projects/cmip6/.

BC-UNet.
The framework to construct the CLIMEA-BCUD, called BC-UNet is demonstrated in Fig. 2. BC-UNet takes GCM simulation datasets and observation as input.It has two main steps: (1) bias correction and (2) UNet downscaling.The details of the two steps are as below.
In the first step, the bias correction method using QDM is applied, which can reduce the bias between observations and GCMs outputs and preserves the change of model projection in quantile 39,40 .When applying bias correction, the GCMs outputs are interpolated to 1° × 1° coarse horizontal resolution to match the MSWX_LR with bi-linear interpolation algorithm.Then QDM is used to correct the biases between GCMs and MSWX_LR at coarse resolution, and to calculate the bias corrected GCM results (GCM_BC).
In the second step, UNet with 3 layers neural network, known for its exceptional performance in super-resolution and downscaling tasks, is used for climate downscaling 41 .Every convolution and downsampling operation lead to a feature map, which captures the spatial features.The UNet with 3 layers represent 3  3).
As the goal of training stage, the loss function 42 plays an important role in directing the neural network parameter update.The neural network minimizes the loss function value by continuously updating its parameters during the training stage.Training of the UNet model is completed when the loss function converges to the minimum.This study proposes a new loss function based on the mean absolute error (MAE).The loss function effectively augments the UNet's capacity to regenerate extreme precipitation events and mitigating the bias of variable underestimation.The loss function is as follows:  Where i is the grid point which are less than mean, and j indicates grid point which are greater than mean.Weight w is 5 to decrease the underestimation of downscaling model.In order to effectively capture the fine features of the MSWX dataset in different seasons, four UNets are trained for each variable, with each UNet being responsible for a different season (MAM, JJA, SON and DJF for spring, summer, autumn and winter respectively).To achieve this, inputs for each season are constructed from data (0.1° × 0.1°) which is downscaled from MSWX_LR by a factor of 10 using bi-linear interpolation algorithm and static elevation (z; coarse to 0.1° spacing resolution) from Global 30 Arc-Second Elevation 43 (GTOPO30), and original MSWX serves as label for each season.This study feed the univariate image and terrain data to the UNet, and the outcome is a single image.The UNet uses max-pooling for downsampling, deconvolution for upsampling, and long-hop connections to concatenate feature maps of the same resolution.All inputs and

technical Validation
In order to comprehensively assess the accuracy of the CLIMEA-BCUD, the spatial distribution of climate mean, the variation of annual mean and root mean square error (RMSE) are calculated against the MSWX dataset from 1979 to 2014 (Table 3).The RMSEs between the raw GCM and MSWX are listed to assess the accuracy of the CLIMEA-BCUD.Notably, INM-CM5-0, MPI-ESM1-2-HR, and MPI-ESM1-2-LR in CLIMEA-BCUD exhibit better skills with relatively low RMSEs for surface air temperature.Tasmax in CLIMEA-BCUD shows the best performance with the RMSEs below 0.58 °C and MBs between −0.52 °C and −0.27 °C, which is better than the raw GCM with the RMSEs above 2.31 °C and MBs between −1.27 °C and 1.13 °C.Tasmin in CLIMEA-BCUD shows a lower RMSE (0.78 °C) than the raw GCM whose lowest RMSE is 2.32 °C.For precipitation, most CMIP6 models in CLIMEA-BCUD are able to reproduce the distribution of mean precipitation with the RMSEs below 0.37 mm/day, showing better performance than the raw GCM with the RMSEs of around 1.00 mm/day.The surface wind speed in CLIMEA-BCUD has RMSEs ranging from 0.13 m/s to 0.15 m/s and surface relative humidity has a degree of RMSEs between 0.97% and 1.60%.Compared with CLIMEA-BCUD, the surface wind speed in raw GCM has RMSEs ranging from 0.92 m/s to 1.41 m/s and surface relative humidity has a degree of RMSEs between 6.50% and 11.87%.CLIMEA-BCUD also has RMSEs larger than 3.0 W/m 2 for surface downward radiative fluxes, especially for surface downward longwave radiation.
Figure 4 illustrates the distribution of multi-model ensemble mean bias between the raw GCM and MSWX, CLIMEA-BCUD and MSWX.Evidently, tas in CLIMEA-BCUD is comparable to that in MSWX over regions with flat terrain, showing much better performance than the raw GCM which has much larger bias.Even in the high-altitude regions such as the Qinghai-Tibet Plateau, the multi-model ensemble mean of CLIMEA-BCUD is able to capture the key features including the variation of annual mean tas and spatial patterns of the tas climate mean.Compared with CLIMEA-BCUD, tas in the raw GCM is significantly underestimated over the Qinghai-Tibet Plateau.For precipitation, the bias of multi-model ensemble mean ranges from −0.6 mm/day to 0.6 mm/day over most regions in East Asia.While precipitation in the raw GCM is significantly overestimated over East Asia by around 1.0 mm/day.A relatively large bias for CLIMEA-BCUD above 1.0 mm/day can be found over the south eastern side of the Qinghai-Tibet Plateau, and the west coast of Africa, which is smaller than the raw GCM with a bias above 1.4 mm/day.Compared with the raw GCM, CLIMEA-BCUD for all variables can effectively and generally reproduce the spatial distribution of climatological average from 1979 to 2014 with higher SCCs and lower MBs and variation of annual mean with much lower RMSEs.
Seasonality. Figure 5 illustrates the seasonal cycle of all variables from the raw GCM output.Figure 6 shows that CLIMEA-BCUD can well reproduce the seasonal cycle of surface air temperature with a correlation of 1.0, but shows large uncertainties and warm biases in summer.Compared with raw GCM, the multi-model ensemble mean of CLIMEA-BCUD can well reproduce the seasonal cycle of surface air temperature with a correlation of 1.0 and lower uncertainties; yet significant cold biases are found in spring and winter.Because of the normalization, surface air temperature in CLIMEA-BCUD maintains the advantage of QDM outputs, which can represent time series with higher CC and lower uncertainties than the raw GCM.For the seasonal cycle of precipitation, the multi-model ensemble mean of CLIMEA-BCUD exhibits a good correlation (0.99) and a low RMSE of 0.2 mm/day, but has relatively large uncertainties, particularly in summer when precipitation shows strong spatio-temporal variability.
Surface wind speed in the multi-model ensemble mean of the raw GCM shows good correspondence with MSWX with a high correlation of 0.98 and RMSE of 0.33 m/s, but it clearly overestimates wind speed and exhibits a large uncertainty.While surface wind speed in the multi-model ensemble mean of CLIMEA-BCUD shows good coherence with MSWX with a high correlation of 0.98 and a lower RMSE of 0.15 m/s and significantly reduces the uncertainty, it clearly overestimates wind speed in winter and underestimates it in summer.The surface relative humidity displays a lower degree of seasonal variation than that of MSWX, leading to a rather low correlation of 0.63, which is still higher than the raw GCM (correlation 0.43).Multi-model ensemble mean of CLIMEA-BCUD can well generate downward longwave radiation, downward shortwave radiation, and surface specific humidity.In general, the multi-model ensemble mean of CLIMEA-BCUD, compared with the raw GCM, reduces the uncertainties and achieves higher correlation and lower RMSE.
Extreme events.Regarding the precipitation events, 4 distinctive classes of precipitation events are categorized: light rain (1 ≤ pr < 10 mm/day), moderate rain (10 ≤ pr < 25 mm/day), heavy rain (25 ≤ pr < 50 mm/ day) and rainstorm (pr ≥ 50 mm/day) according to the China Meteorological Administration 45 (CMA).By counting the frequency of precipitation events at each grid and comparing it with the raw GCM, the performance of the CLIMEA-BCUD in generating the precipitation events can be assessed (Fig. 7).For the light rain events, CLIMEA-BCUD is capable of capturing the overall pattern of MSWX, and shows more detail than the raw GCM.QDM can preserve daily precipitation extreme events well, which are also preserved by CLIMEA-BCUD.CLIMEA-BCUD has a higher frequency between 60% and 70% than MSWX which is below 60% over the eastern Pacific.For moderate rain events, the over shift of rain belt for the raw GCM is found in the eastern Pacific and the Qinghai-Tibet Plateau.Moreover, the raw GCM overestimates the frequency over southern China.CLIMEA-BCUD performs better in producing the distribution of frequency, with two main rain belts over the Pacific.But it slightly underestimates the frequency over land areas, especially over southeastern China.For the heavy rain events, GCMs overestimate the frequency over the southeastern Pacific and southern China.CLIMEA-BCUD can capture the spatial distribution of frequency with slight underestimation over most regions in East Asia and perform more details than the raw GCM.For the rainstorm events, the raw GCM cannot regenerate the distribution over East Asia.CLIMEA-BCUD can reproduce the distribution over oceanic areas.Notably, CLIMEA-BCUD narrows down areas with rainstorm events frequency between 1% and 2%, especially over the Kyushu region of Japan.In general, CLIMEA-BCUD can capture different rank precipitation events well, especially moderate rain, but there are some obvious biases in the eastern Pacific.Projected changes.Based on the evaluation of downscaled daily precipitation and surface air temperature, projections in surface air temperature and precipitation at the end of the 21st century (2070-2100) from CLIMEA-BCUD for all the scenarios (SSP1-2.6,SSP2-4.5, and SSP5-8.5)can be estimated.

Usage Notes
In this study, we describe the CLIMEA-BCUD dataset for East Asia, which provides daily time series of nine meteorological variables at 0.   climate, the high resolution (0.1°) of gridded data is critical for developing regional and global assessments and aiding decision-and policy-making.CLIMEA-BCUD is presented in netCDF format (.nc), and it is freely available at the Science Data Bank (https://doi.org/10.57760/sciencedb.07718) 44.While CLIMEA-BCUD has a wonderful performance in producing the overall patterns of climate mean, seasonal cycle, frequency, and future changes, some limitations must be acknowledged.Firstly, data users should be aware of underestimation when using CLIMEA-BCUD due to its underestimation in representing observations.Secondly, despite displaying good performance in reproducing seasonal variability and extreme events, the bias-corrected products may contain inherent uncertainties, and obscure some fundamental deficiencies presented by the climate models.Numerous studies have extensively researched methods to enhance model performance in the field of super-resolution, and these advancements are expected to be applicable to downscaling tasks as well.Among them, image enhancement techniques including adaptive gamma correction with weighting distribution 46 (AGCWD), adaptive gamma correction with color preserving framework 47 (AGCCPF), range limited Bi-histogram equalization 48,49 (RLBHE), and region adaptive contrast limited adapted histogram equalization 50 (RACLAHE) are common and powerful tools for improving the performance of DL model.It is valuable to explore its effectiveness in the context of climate downscaling.Furthermore, several studies have explored improved models based on UNet such as UNet++ 51 , UNet3+ 52 , ResUNet 53 and USE-NET 54 , which have demonstrated significant potential in various applications.Additionally, models that combine technologies such as generative adversarial network 55 (GAN) and Transformer 56 have also shown great potential for further improvement.

Fig. 3
Fig. 3 UNet framework in this study.Blue block indicates the convolution, batch normalization and ReLu operation.Yellow block corresponds to the max pooling operation to downsampling.Red block means the transparent convolution operation to upsampling.Grey arrow and grey block means the skip connection to merge the feature maps.

Figure 8 (
the raw GCM) and Figure9(CLIMEA-BCUD) shows the changes in multi-model ensemble mean surface air temperature and precipitation at the end 21 st century for all the scenarios (SSP1-2.6,SSP2-4.5, and SSP5-8.5).It is found that the surface air temperature will rise in East Asia, with a greater warming range in the northern part of China especially under the SSP5-8.5 scenario, which shows a similar distribution with the raw GCM.The ensemble mean median change in tas from CLIMEA-BCUD is projected to increase by 1.57 °C in SSP1-2.6,2.53 °C in SSP2-4.5, and 4.52 °C in SSP5-8.5, which is similar to the raw GCM with 1.63 °C in SSP1-2.6,2.59 °C in SSP2-4.5 and 4.58 °C in SSP5-8.5.The projection of ensemble mean tasmax and tasmin from CLIMEA-BCUD is similar to that of tas, with the temperature increasing from south to north across East Asia, indicating that the CLIMEA-BCUD preserves the climatic trend from the raw GCM.In terms of precipitation, the projected change in CLIMEA-BCUD generally shows an increase over most areas in East Asia, and the ensemble mean median change is projected to increase by 0.19 mm/day in SSP1-2.6,0.22 mm/day in SSP2-4.5 and 0.34 mm/day in SSP5-8.5.While the projected change in the raw GCM has the same changes and the ensemble mean median change is projected to increase by 0.20 mm/day in SSP1-2.6,0.24 mm/day in SSP2-4.5 and 0.37 mm/day in SSP5-8.5.A significant increase of precipitation in the raw GCM is found over the Indian Ocean and the western Pacific Ocean.For CLIMEA-BCUD, precipitation will significantly increase in eastern China, and slightly decrease in the northwestern regions.It will also increase in India, especially under the SSP5-8.5 scenario.The increase of precipitation over the ocean is more notable, mainly in the Indian Ocean and the western Pacific Ocean.

Fig. 5
Fig. 5 Seasonal cycles of the nine variables during 1979-2014.Red line is the MSWX seasonal cycle.Blue line is the multi-model ensemble mean seasonal cycle of GCMs, the shaded area represents uncertainties of all models with one standard deviation.

Fig. 6 Fig. 7
Fig. 6 Seasonal cycles of the nine variables during 1979-2014.Red line is the MSWX seasonal cycle.Blue line is the multi-model ensemble mean seasonal cycle of CLIMEA-BCUD, the shaded area represents uncertainties of all models with one standard deviation.

Fig. 8
Fig. 8 Future change of surface air temperature and precipitation in the end 21 st century (2070-2100) from GCMs.

Fig. 9
Fig. 9 Future change of surface air temperature and precipitation in the end 21 st century (2070-2100) from CLIMEA-BCUD.