Deep generative model super-resolves spatially correlated multiregional climate data

Super-resolving the coarse outputs of global climate simulations, termed downscaling, is crucial in making political and social decisions on systems requiring long-term climate change projections. Existing fast super-resolution techniques, however, have yet to preserve the spatially correlated nature of climatological data, which is particularly important when we address systems with spatial expanse, such as the development of transportation infrastructure. Herein, we show an adversarial network-based machine learning enables us to correctly reconstruct the inter-regional spatial correlations in downscaling with high magnification of up to 50 while maintaining pixel-wise statistical consistency. Direct comparison with the measured meteorological data of temperature and precipitation distributions reveals that integrating climatologically important physical information improves the downscaling performance, which prompts us to call this approach \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi$$\end{document}πSRGAN (Physics Informed Super-Resolution Generative Adversarial Network). The proposed method has a potential application to the inter-regionally consistent assessment of the climate change impact. Additionally, we present the outcomes of another variant of the deep generative model-based downscaling approach in which the low-resolution precipitation field is substituted with the pressure field, referred to as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi$$\end{document}ψSRGAN (Precipitation Source Inaccessible SRGAN). Remarkably, this method demonstrates unexpectedly good downscaling performance for the precipitation field.


INTRODUCTION
The increase of greenhouse gases in the air composition due to human activities is now believed to have led to the rise in the frequency of unusual disasters [1][2][3][4].
To prevent an irreversible collapse of the current ecosystem and resulting impoverishment of human lives, many countries have set specific medium-and long-term goals for the reduction of greenhouse gas emissions, and similar paradigm shifts in decision making have occurred even at the private sector level.
Numerical approaches are regarded as the most powerful and reliable scientific option at the moment in quantitatively evaluating the efficacy of political or management plans that aim to tackle climatological issues.The Global Climate Model (GCM) is the prime example, which has accurately reproduced past and current climate changes, and its reliability of quantitative future estimates is sufficiently high [5].Such future projections with high accuracies rely on the overall consideration of the global atmospheric and oceanic circulation (and even still more complicated ingredients such as chemical [6] and biological [7] processes) [8][9][10][11][12], and thus, the horizontal spatial resolution is sacrificed by the required computational costs; the typical resolution of the GCMs is only down to the order of one degree in longitude and latitude, corresponding to a grid size of more than a hundred kilometers on the equator.Therefore, to exploit the GCM outputs to assess the impact of climate change and to make proper decisions, it is obviously vital to superresolve the coarse grid spacing of simulations and to reach * Norihiro.Oyama.vb@mosk.tytlabs.co.jp the fine resolution of interest.Here, special attention should be given to reproducing the inherent spatial correlation of the meteorological variables, as well as the local statistics, in decision making by integrating multiregional information [13][14][15][16][17], such as transportation infrastructure development and sustainable energy networks, future urbanization, and agricultural intensification.
A variety of techniques to super-resolve GCM outcomes, which are referred to as the downscaling (DS) methods in meteorology and climatology, have been developed [18][19][20][21][22][23][24][25].They are categorized roughly into two groups: dynamical [18][19][20][21] and statistical DS methods [22][23][24][25].The dynamical downscaling method is based on physical footings: several coupled differential equations are numerically integrated with the results of the GCM (or any other crude-resolution simulation results) being used as the boundary conditions.However, the computational cost again creates a trade-off between the accuracy and the feasibility.In contrast, in the statistical approaches, we turn a blind eye to the physical laws behind the data.Instead, empirical links between the largeand local-scale climates are identified and applied to the crude-resolution climate model outputs.Since the systematic errors of the naively interpolated GCM output (referred to as the bias) are locally corrected such that the statistical properties are precisely reproduced, the spatial correlation, i.e., the information on the events occurring at distant places, is discarded [26? , 27].The statistical downscaling methods overcoming the latter problem remain to be developed.
In this paper, we propose a machine learning approach that super-resolves the GCM outputs and reproduces both the local statistics and the instantaneous spatial correlations between distant regions.Among several options for improving the resolution of geophysical or cli- matological data [28][29][30][31], our method is based on the generative adversarial network (GAN) approach, which has been proven to be a very powerful downscaling tool through several previous studies [32,33].To accurately reproduce the physical nature, we use auxiliary but climatologically important data, sea-level pressure distribution and topographic information, in addition to the target variables, temperature and precipitation distributions (see Fig. 1A and the next section for more details).Since this method falls within the criteria of the firstlevel physics informed super-resolution methods [34], we name our method πSRGAN (Physics Informed Super-Resolution Generative Adversarial Network).The direct comparison with the measured meteorological data shows that the local statistical properties are obtained using the practical output from the GCM simulations as accurately as the conventional statistical downscaling method that is focused on matching these properties.We then highlight that the spatial correlation of variables is accurately reproduced, which could not be achieved with conventional downscaling methods (see Fig. 1B).The present method is therefore the next generation down- scaling method that has a potential application in climate change assessment considering both local-scale and interregional events.We also considered another variant of the SRGAN that projects the high-resolution temperature and precipitation field from the low-resolution information about only temperature and pressure (we call this variant the ψSRGAN: Precipitation-Source-Inaccessible SRGAN).With this special variant, we demonstrate the surprisingly robust ability of the SRGAN-based methods to express natural results.

Super-Resolution Generative Adversarial Networks with various data
We employ a super-resolution method based on generative adversarial networks (Super-Resolution Generative Adversarial Networks: SRGAN, see Methods section for details) as the basic machine learning algorithm, which was proven to have potential in DS with a scale factor up to fifty [32].Although the original SRGAN was able to restore physical consistency in the turbulent wind velocity field, which was shown in terms of the well-known Kolmogorov 5/3 power-law [38], it was also reported that it showed a worse performance in reproducing the basic statistics, such as the pixelwise consistency like mean squared error, than a less sophisticated deep learning approach [32].In this work, considering two distinct variants in addition to the standard SRGAN, we show that the integration of the low-resolution input with auxiliary information enables to overcome the drawback of relatively poor reproducibility of simple statistical properties and that the ability of SRGAN-based methods to downscale in a "physically natural" manner is quite robust against the change in the input low-resolution information.
There are a vast variety of LR information, as seen in several similar recent attempts [33,[39][40][41].Among them, we employed the sea-level pressure, one of the fundamental hydrodynamic (or aerodynamic) variables on which the various quantities of sub-models of GCMs are based, as a piece of key auxiliary information.Also, this variable is described with fewer assumptions in the models than other meteorological variables such as humidity.In the literature, strong links between synoptic-scale horizontal circulation and vertical motion are discussed in terms of the sea-level pressure [42][43][44][45].In the first variant, we incorporate the low-resolution pressure field as an auxiliary physical information (Fig. 1A), which serves as guidance for the DS of the target variables, namely temperature and precipitation.In this method, moreover, we introduced the topographic information as another auxiliary information since it can be utilized in a high-resolution format only if we assume it is identical over the time window of interest (order of tens to a hundred years).The topographic information is indirectly supplied as a part of teacher data during the training by adding to one of the output channels.In this way, we can provide both low-resolution and high-resolution auxiliary data in an unambiguous manner without any artificial operation (like resolution matching by interpolation or pooling).Since the use of supplemental physical information dur-ing learning is regarded as primary-level physics-informed machine learning [34], we call this method the Physics-Informed SRGAN (πSRGAN for short).
The second variant of SRGAN is designed to generate high-resolution temperature and precipitation fields using solely low-resolution data pertaining to the temperature and pressure fields.
This variant is referred to as the Precipitation-Source-Inaccessible SR-GAN (ψSRGAN) and demonstrates the surprisingly robust capability of SRGAN-based methods to describe "physically natural" precipitation fields.
The performances of three variants of SRGAN (standard SRGAN, ψSRGAN, and πSRGAN) are evaluated via direct comparisons among them and with a non-machine learning-based method: we summarize these methods in Table I.The cumulative distribution function-based downscaling method (CDFDM) is the widely used conventional statistical DS method (see the method section for the details), and the SRGAN refers to the original SRGAN-based method presented in Ref [32].

Data sets
We use the climate model simulation outputs for the low-resolution input and the real observation data for the high-resolution ground truths in the case studies.As the low-resolution data, we used the Japanese 55-year reanalysis (JRA-55) data from 1980 to 2018 [36] with data assimilation.The grid spacing is 1.25 degrees.The daily data corresponding to the reference data (in Japanese local time) were created from 3-hourly simulation data.Specifically, data at 0Z, 3Z, 6Z, 9Z and 12Z on the target date and data at 15Z, 18Z and 21Z on the previous day of the target date were averaged to obtain the daily data in JST.The reference high-resolution data were the Agro-Meteorological Grid Square Data (AMGSD) [35].The 1 km-meshed daily data over Japan are constructed using the in-situ observation network system of the Japan Meteorological Agency, which covers the entire land area over Japan from 122 to 146 degrees east and from 24 to 46 degrees north.Upon being fed into the networks, all of the data undergo a process of normalization and concatenation.For further information regarding the technical aspects of these procedures, please refer to the SI Appendix.
We use the data from 1980 to 2018 (14245 days in total).These data are split into training, validation, and test datasets in a time-series manner as summarized in Table II for both low-resolution (JRA-55) and highresolution (AMGSD) data.We emphasize that this timeseries partitioning, characterized by a substantial volume of test data, represents a challenging task for downscaling mid-term future projections, and consequently, necessitates the incorporation of the climate change trend.The AMGSD data were adjusted such that the grid spacing was 0.025 degrees/grid both in latitude and longitude.We extracted the data for the region from 130.625 to 140.625 degrees east and from 30.625 to 40.625 degrees north, which results in a 400 × 400 pixels square.The JRA-55 data of the corresponding region are 8 × 8 pixel squares, and thus, the scale factor for the DS tasks is 50.

Qualitative visualization
We first present typical qualitative visualizations for the temperature and the precipitation fields of one day in Fig. 2, which highlights the ambitious downscaling with the present large scaling factor of fifty.Here, the highresolution information of 2500 pixels is extracted from one single pixel in the low-resolution counterpart.We compare the results of different protocols (summarized in Table I), along with the visualization of the original low-resolution JRA-55 and the high-resolution AMGSD data.
The difference in the downscaled temperature from the ground truth is not very large (the upper row of Fig. 2), and it is difficult to find any superiority or inferiority in performance from these qualitative plots.In contrast, the results for precipitation demonstrate rich information on the features of DS protocols (the lower row of Fig. 2).The CDFDM result shows an overly smoothed profile compared to the GT: high precipitation values (represented by red colors) are observed in a vaster area.On the other hand, SRGAN family finely reproduce the localized nature of the high precipitation areas, which the CDFDM fails to describe.Remarkably, even ψSRGAN also succeeded in reproducing the localized heavy rain event, although, in this method, the low-resolution precipitation field is not supplied as an input.The GANbased methods [13][14][15][16][17] are recognized to be advantageous in reproducing such fine structures.The maximum precipitation values of the DS results are all very close to that of the GT.Please refer to Fig. S2 in the SI Appendix for the graphical depictions of the differences between the GT and DS outcomes, which offer a more direct and intuitive insight into the distinctions among the performances of different methods.We note that although the results for πSRGAN were excluded from Fig. 2 due to their substantial similarity with those for SRGAN and space limitations, they are included in Figure S2 of the SI Appendix.

Single-site statistics
Here and in the following subsections, we discuss the statistical features of downscaling results, focusing on the precipitation p, which is generally considered to be difficult to downscale accurately.In particular, we carefully examine the statistical consistency with the ground truth, which is crucial in actual usage of the DS results, e.g., in impact assessment of climate change in the future.Although the results presented in the main text are climatologically oriented indicators and not standard measures used in the field of image processing, we provide the values of pixel-wise mean squared error and corresponding peak signal-to-noise ratio in the SI Appendix.
We first measure the probability distribution functions (PDFs) of the precipitation data at 12 representative sites, P S (p).Here, the PDFs are calculated using the set {p k (l)|l ∈ S and k ∈ D test }, where S stands for the site of interest (each site includes 100 grid points: see Table S3 in the SI Appendix), D test is the set of dates that are used for the test data (year span of 2001-2018; Table II), and p k (l) is the value of the precipitation at the pixel l and for the date k (we omit the subscript unless necessary below).The results are shown in Fig. 3  (A-L).The 12 sites in Fig. 3 are chosen from the seaside areas within the system boundary of this study, as depicted in Fig. 3(M).Table S3 in the SI Appendix provides more precise information (latitude, longitude, etc.) about these sites.
Overall, Fig. 3(A-L) shows that all methods express the regional dependence.Regarding each method, the CDFDM provides results matching the GT very well, including the heavy rainfall regime where p > 50 ∼mm/day up to the values at which P GT (p) becomes around 10 −4 .This is expected because in the CDFDM the data are processed such that the resulting PDFs become completely consistent with the training data.If we shift our attention to the results of SRGAN family, we first notice that SRGAN and πSRGAN are as accurate as the CDFDM for most sites and most values of p.Moreover, surprisingly, even ψSRGAN succeeded in the projection of precipitation in the range P GT (p) > 10 −3 at most sites although it was not provided with any direct information about the precipitation.In particular, we would like to stress that an extremely high accuracy has been successfully obtained for Shizuoka, a representative site on the Pacific Ocean side (south side), where pressure-dominated summer-type precipitation events occur frequently.This indicates that the pressure field effectively serves as crucial information for the precipitation projection, such as the location of the typhoons.On the other hand, the accuracy is significantly lower at sites on the Sea of Japan side (north  side), Akita, Niigata, and Kanazawa (A, C, F), which are less directly affected by typhoons.These trends are interestingly consistent with our knowledge, and it appears as if SRGAN is extracting physical laws from the data and making predictions, just as humans do.Then it is natural that this success of projection of the highresolution precipitation from the low-resolution pressure drove us to believe the integration of the input information employed in πSRGAN would further improve the downscaling performance of SRGAN.However, since all SRGAN, πSRGAN, and CDFDM offer highly accurate results, it is difficult to visually judge from the graphs which one is better than the others: we make a quantitative comparison in the next paragraph.Before moving forward to the quantitative analysis,we remark on the discrepancies observed for tails in the large precipitation (small probability of P GT (p) < 10 −4 ) regime even in the cases of the CDFDM.These rare events corresponding to disaster-level torrential rains are very important from the perspective of disaster prevention but are beyond the limit of the current statistical DS methods, on which we provide an overview in the Discussion section.
To investigate the difference in the performance of πSRGAN and SRGAN, we quantify the accuracy of each method using the Kullback-Leibler divergence D KL : where P GT (p) is the PDF of the GT and P DS (p) is that calculated using the downscaling results (DS ∈ {πSRGAN, SRGAN, ψSRGAN, CDFDM}).Generally, the more different P GT (p) and P DS (p) are, the larger D KL becomes; D KL vanishes when the two PDFs are exactly identical.Since the difference between two PDFs, P GT (p) and P DS (p), is weighted by the ground truth distribution, the KL divergence places more importance on the frequently occurring events than on rare events.Technical details such as the data preprocessing employed are provided in SI Appendix.The KL divergence between the GT and DS results using distinct methods are shown by bar plots in Fig. 3(N) and summarized in Table III, where the values averaged over the 12 sites are presented.The precise values of D KL for each single site are provided in Table S4 in the SI Appendix.As expected from the fact that the CDFDM concentrates on matching these statistics for the training data, it gives the best values for most cases.However, it should be noted that, at Hiroshima (denoted by K), πSRGAN marks a better score than CDFDM.This result evidences the remarkable performance of πSRGAN concerning the basic statistical characteristics that the standard SRGAN can handle relatively inadequately.Indeed, among SRGAN family, πSRGAN marks the best performance if we compare them by the average value over 12 sites: DKL (P GT ||P πSRGAN ) is smaller than DKL (P GT ||P SRGAN ) by approximately 40% (the bars signify that the presented values represent the mean across 12 sites.).However, πSRGAN is not always better than SRGAN and it shows worse results than SRGAN at Niigata, Kanazawa, and Oita (C, F, L).It is noteworthy that these particular locations are precisely where the performance of ψSRGAN is significantly lacking.This observation suggests that the inclusion of low-resolution pressure fields may have led to undesired effects.We also note that, on the other hand, ψSRGAN exhibits a lower value of D KL than that of the standard SR-GAN at Shizuoka (Fig. 3(D)) where the pressure field is expected to play a crucial role in the determination of rainfall events.These findings about the effects of the introduction of auxiliary fields should be utilized for the future refinement of the method.To give a conclu-sion for this section, remarkably, even the standard SR-GAN shows the same order of values of D KL as those of CDFDM.Moreover, the provision of climatologically important auxiliary information can further improve the precision by 40%, evidenced by the results of πSRGAN.

Statistics over all sites
As another meteorologically important statistical point of view, we further measure the statistics over all sites: the PDFs of the mean µ p and the standard deviation σ p of the precipitation calculated over all test data on each pixel l: where k ∈ D test is again the sample index, and N test is the number of samples in D test .The probability distribution of µ p and σ p , denoted by P (µ p ) and P (σ p ), are shown in Fig. 3(O,P).Note that here the values calculated on each pixel serve as samples for these PDFs.The corresponding KL divergence D KL for P (µ p ) and P (σ p ) are presented in Fig. 3(Q,R) as well.
Remarkably, regarding the statistics of pixelwise average over all dates in the test dataset P (µ p ), πSRGAN (and moreover, SRGAN as well) achieves a better score than CDFDM.However, on the other hand, regarding P (σ p ), CDFDM is the best and it shows almost identical results as GT.The small shifts of the whole curve of P (σ p ) to the left of SRGAN-based methods are consequences of the underestimation of the high-precipitation events shown in Fig. 3(A-L).These results suggest that SRGAN-based methods exhibit a bias towards typical values in downscaling results, as opposed to presenting bold projections of extreme events, compared to CDFDM.This is actually an anticipated tendency considering the design of the standard training scheme employed in machine learning-based methods.

Spatial correlation
Next, we examine in detail the spatial correlation of the downscaled results.The importance of the spatial correlation of the meteorological variables, i.e., the relation between two distant sites, has been realized very recently [13][14][15][16][17], e.g., in the context of impact assessment of climate change.However, conventional DS methods such as CDFDM have proven to overestimate the correlation even though the statistical consistency with the GT is maintained [26? , 27].Such a tendency is actually seen in the qualitative visualizations in Fig. 2, where the overly smoothed profiles are obtained.We thus systematically evaluate the accuracy in expressing the spatial correlation of the precipitation by measuring the Pearson's correlation coefficients of the precipitation C R M (l, l ) between two sites, l and l , which is defined as: is the deviation of the k-th sample at site l from its reference average value pR M (l).The subscript M indicates that the average is taken over the data of month M , the superscript R ∈ {GT, πSRGAN, SRGAN, ψSRGAN, CDFDM} distinguishes the datasets and N M represents the total number of test data samples belonging to month M .Since the distribution of the correlation coefficients is known to have features specific to each month, we measure the monthly values of the coefficients.Below, we focus on the results for M = January, for which a previous work has pointed out the existence of a distinguished spatial pattern of precipitation correlation [46].
Figures 4(A-C) show the spatial distribution of the correlation coefficients C R Jan (l, l ), with Nagoya, Niigata, and Hiroshima being the reference points l (the locations of the reference points are marked by the star symbols).The correlations measured for the CDFDM are too high compared to the GT at almost all sites, as shown in Fig. 4(A).This is mainly because the 2500 grid points extracted from the corresponding single low-resolution pixel tend to have similar values.In contrast, the results of the SRGAN family exhibit much sharper spatial contrast, e.g., the contrast between the north and south sides of the Chugoku area (around [36 • N, 135 • E]) is well captured.The differences in performance among these SRGAN-based methods are very subtle and a precise quantification is necessary to rank them: we will get back to this issue in the next paragraph.In Fig. 4(B,C), we qualitatively observe the same difference in the accuracy among the methods.In particular, the SRGAN family, even including ψSRGAN, successfully reproduce the nonmonotonic nature of the correlation as a function of the distance from the reference site: e.g., in the results of the GT and SRGAN-based methods in Fig. 4(B), along the north side coastline (see the arrow in the figure), the correlation decays quickly near the reference point and then grows again around the Noto peninsula (around [38 • N, 137.5 • E]).The CDFDM, on the other hand, merely exhibits the monotonic decay of the correlation along the same line.Please see also the SI Appendix for the difference plots between the GT and DS results.
To quantify the accuracy of C DS Jan (l, l ) for the different methods, we measure the mean square error (MSE) of the spatial distribution of the correlation coefficient defined as:  where l is the reference site and N OS ≡ 630 is the number of observation stations (see SI Appendix for a detailed explanation).The values of M SE DS Jan. measured based on each reference site are compared in Fig. 4(D), and the average values are listed in Table IV (the values for each site are shown in Table S4 in the SI Appendix).All SRGAN family exhibit much better results than those of the CDFDM for all sites considered here and even ψSRGAN offers twice better results.Specifically, the best one, πSRGAN, achieves 3.6 times better accuracy than the CDFDM for the average value over 12 sites.This result of the SRGAN-based methods being advantageous in achieving the "naturalness" of the spatial pattern is consistent with the report in ref. [32].If we further compare the results of SRGAN-based methods, although πSRGAN offers the best performance in terms of the mean value over all 12 sites, the standard SRGAN has the best values at the majority of locations, albeit by only small margins as shown in Figure 4D (and Table S4 in the SI Appendix).We interpret this result as meaning that both πSRGAN and SRGAN demonstrate comparable performance in relation to the statistical characteristics of spatial correlation.Together with the discussion in the previous subsections, the results presented in this section enable us to conclude that in the present πSRGAN, the auxiliary fields enhance the reproducibility of the simple statistics (such as P (p)) while maintaining the expression ability of the natural spatial expanse.Such a strong downscaling ability highlights the applicability to localscale and interregional assessments of climate change.

DISCUSSION
We have developed a machine learning-based statistical downscaling (DS) method with a large scale-factor of fifty, while maintaining both the basic statistical properties and the spatial correlation.We employed a physicsinformed type approach [34] on the basis of the SRGANbased method, and specifically, we developed a framework to use the proper auxiliary physical information along with the low-resolution input to attain large improvements in the DS performance as summarized in Fig. 1 and Tables III, IV.High accuracy comparable to the CDFDM, a conventional method in actual use, was demonstrated by directly comparing the climatological statistical properties with the real data.More importantly, our approach exhibited the highly accurate reconstruction given in Fig. 4 of the natural spatial distribution of the precipitation correlation coefficient, which was a serious issue for the conventional statistical DS methods, including CDFDM [26? , 27].Since the importance of the multiregional spatial correlation has recently been recognized [13][14][15][16][17], the present method is a promising new-generation alternative to conventional statistical DS methods, particularly in situations where the integration of the multiregional information is necessary.
The detection and prediction of rare events are vital issues inter alia in the context of climate change assessments.The methods including the present πSRGAN indeed have yet to accurately capture the low probability but significant rainfalls, as shown in Fig. 3. Here, we discuss possible directions to ameliorate the problem.First, we could raise the level of physics-informed machine learning in terms of the classification proposed in ref. [34].If we succeeded in directly incorporating some part of the governing equations into the learning process while maintaining the computational efficiency, local phenomena such as heavy rains would be predicted with high reliability.Another direction is to take measures to reform the basic machine learning architecture itself.Following the GAN-based approach, flow-based and diffusion model-based methods have attracted public attentions as powerful next-generation tools for general super-resolution tasks [47,48].The main feature of these approaches is to generate multiple image candidates from a single input.Therefore, probabilistic information is expected to be drawn from the multiple super-resolved images, which would enable us to tackle the rare event predictions.
Another perspective concerns the use of machine learning techniques to improve the efficiency of dynamical downscaling, i.e., developing a high-speed machinelearning-based solver for the governing equations of climate models.Here we refer to an example of a speed up of multiscale simulations; in ref. [49] the Gaussian process is used to reduce the computational burden of multiscale simulation for polymeric liquid to achieve a reduction by a factor of 30-100 without loss of accuracy.Breakthroughs driven by similar approaches are expected once the complexity of the governing equations for the climate models is overcome.
Finally, we refer to the generalization ability of SRGAN.Here, we have selected SRGAN instead of πSRGAN due to the anticipated lack of high generalization ability of the latter (πSRGAN relies on topo-graphic information that is specific to the training area).In the SI Appendix, we present the results of the generalization test, in which we tried to execute downscaling computations for samples derived from a different area than the one employed for training.Specifically, the test area encompasses the region spanning from 135.625 to 145.625 degrees east and from 35.625 to 45.625 degrees north, with a five-degree shift in both the eastward and northward directions from the original region used for the training.The findings of the examination demonstrate considerably inferior performance compared to those reported in the main text, exposing the deficient generalization capability.This suboptimal performance of the generalization ability is a somewhat predictable attribute since the training data are all from a specific same region.Even though we did not explicitly provide information about the topography in SRGAN, it is plausible that the network learned it indirectly through the temperature field, which exhibits a strong correlation with topography.We stress that we observe large errors even for Niigata and Kanazawa, which were part of the original computational domain.To enhance the generalization ability, we would need to incorporate samples from a more extensive range of areas.The exploration of such an approach is left for future research.

CDFDM
Among a variety of statistical methods, we use, as a reference, the cumulative distribution function-based downscaling method (CDFDM) with quantile mapping that is in actual use.
If we simply map the low-resolution GCM simulation results onto the point at which the observations are available, we generally see a systematic difference, defined as bias, which comes from the systematic error of the model prediction and/or from the interpolation error.Removing this inherent bias is especially important in applying the downscaling results to the impact assessments.In the CDFDM, bias is corrected via an empirical transfer function constructed in advance using measured data of distributional variables and the corresponding simulation results.The detailed procedure of constructing the transfer function is described as follows [37].
The crude low-resolution data obtained from the GCM are first mapped onto a 2 km mesh using simple bilinear interpolation.At each mesh point, an empirical cumulative distribution function (CDF) is then constructed using the interpolated data of the variable of interest over a specified time window.The transfer function is defined as a map of a variable onto the one at which the corresponding CDF of the observation falls within the same quantile level.This preconstructed transfer function is applied under the assumption that the error-percentile relation is conserved over time.In the study, time window of a month is while the original time window is over a half-year [37], to more sensitively capture the seasonal trend [50, Note that while this CDFDM is a nonparametric method, the corrected perfectly matches the corresponding CDF of the observation (for the training data); the statistical properties of the downscaling results are expected to reproduce the observation well.The biascorrected climate scenario obtained with this method has been widely used in climate change impact studies [50,52,53].

SRGAN
We employ a generative adversarial networks-based (GAN-based) method as the basic machine learning architecture, which is called Super-Resolution Generative Adversarial networks (SRGAN) [54].The terminology super-resolution (SR; or, in particular, single-image super-resolution) refers to a method of restoring a highresolution image from the corresponding low-resolution data and is the counterpart of the downscaling in the realm of the general image processing.The GAN-based methods are capable of generating realistic images by pitting a discriminator network against a generator network that generates samples (see Fig. 1A).The discriminator network real data (ground truths) and the fake data (output of the generator network) as inputs and identifies the authenticity of the input samples.The generator network tries to deceive the discriminator while the discriminator tries to judge with high accuracy.As a result, both networks spontaneously learn the "realistic" information.The SRGAN can reproduce fine textures that cannot be achieved by normal convolutional neural network-based variants and offers substantially improved realistic super-resolution images.
Such network-based super-resolution techniques have recently been used for the DS tasks of climatological data.In a representative report by Stengel and coworkers, [32], the authors compared performances of SRGAN-based downscaling methods with previous methods (SRCNN: Super-Resolution Convolutional Neural-Networks).Although the SRCNN-based method appeared to be superior in evaluating the performance in terms of the simple pixelwise MSE, the SRGAN-based method provided realistic results satisfying the important physical requirements, e.g., the energy spectrum of the wind velocity field satisfied the Kolmogorov 5/3 scaling law [38] with remarkable accuracy.The network architecture in our πSRGAN is mostly the same as the original SRGAN introduced in ref. [54], although the batch normalization layers are removed obeying ref. [32]: the explanation of the precise architecture is presented in SI Appendix.We also summarize other technical details, such as the precise learning protocol, hyperparameter tuning, and the normalization of the data there.We note that the representative method compared to the πSRGAN referred to as "SRGAN" in our implementation is a slightly upgraded version including the highresolution topography, which makes possible the decomposition of elements producing the improvement.

I. TECHNICAL DETAILS A. Architectures of networks
The network architecture is depicted in Fig. S1.In this figure, Conv stands for the convolution layer, Dense represents the fully connected layer, and ReLU means Rectified Linear Unit.The numbers and alphabets above the Convs represent the hyperparameters: the numbers after k, n, and s are the kernel size, the number of kernels, and the stride, respectively.The number of kernels in the pixel shuffle layers R is determined by the scale factor r as R = 64 × r 2 .There can be multiple pixel shuffle layers if the scale factor can be factorizable (in that case, each pixel shuffle layer has a factorized prime number value as the scale factor, e.g., 2 and 5 if the scale factor is 10).All Leaky ReLU layers in the discriminator network employ the slope of 0.2 for negative inputs.
Following the conventional architectural design used in SRGAN for image processing, our generator network comprises of a pre-processing stage consisting of a convolution operation with ReLU activation, followed by 16 residual blocks equipped with skip connections, and pixel-shuffling units, culminating with a final convolution layer.It is widely acknowledged that the use of residual blocks enables us to increase the depth of our network without encountering issues such as gradient loss, gradient explosion, or degradation.Additionally, the implementation of pixel-shuffling units is preferred over simple upsampling techniques like deconvolution due to their ability to eliminate undesired checkerboard pattern artifacts.In this section, we explain the details of the learning protocol.The protocol basically obeys ref. [? ].The fiftyfold downscaling is composed of two stages: the low-resolution to medium-resolution (LR-to-MR) and the mediumresolution to high-resolution (MR-to-HR) downscaling stages.The scale factors for these two stages are ten-and fivefold, respectively, and a fifty-fold scaling factor is achieved overall.The MR ground truths for the learning are generated by coarse-graining the HR image (average pooling with the kernel of size 5 and the stride is set to 5).We also introduced pretraining for the Generator network as usual.In total, there are four stages of learning: pretraining and GAN training for both LR-to-MR and MR-to-HR downscaling.We summarize the number of epochs and the batch size for each training stage in Table S1.The same value of the initial learning rate, r l = 10 −4 , is employed for all learning stages, and the learning rate is multiplied by 0.99 every epoch.Under this exponentially decreasing-learningrate protocol, the learning rate becomes approximately one tenth every 230 steps.Comparing the loss values for training and validation data during the training process, we confirmed that these combinations of the epoch numbers and the learning rate prevent the system from overtraining.Other hyperparameters, such as the ones for Adam algorithm, are set to standard values used in most studies.Regarding the loss function, again, we obeyed ref.
[? ].We employed the simple MSE loss L MSE for the pretraining regardless of the resolution stages.For the loss of the Generator network in the GAN part L G , we use the linear combination of L MSE and the adversarial loss L Adv : where α is the weight of the adversarial contribution and we set it as α = 0.001 in this work.The adversarial loss represents the (in)accuracy of the discriminator classification of the outputs of the Generator (into fake and true).For L Adv , ref.
[? ] provides the detailed definition.It is considered that the information can be extracted most efficiently from the Discriminator when the discriminator loss L D is around 0.5.In this work, to maintain the value of L D in the vicinity of 0.5, adaptive training is carried out.In this special training protocol, the training for the discriminator is repeated if L D is larger than 0.6 and that for the generator recurs if L D is less than 0.45.Although in many cases such adaptive loops are performed for each minibatch, in this work, the adaptive loops are over the whole dataset.We have employed these precise protocols because we found them to be better than other options, after trials and errors.

C. Normalization of the data
All the data used for the training are normalized so that most (not all: see below) resulting pixels are in the interval [0, 1].This is simply done by applying the following formula: where x denotes the variable of interest (one of the temperature, precipitation, sea-level pressure, or topography), x is the normalized value, and x lb and x scale are the lower bound and the scale of the variable x, respectively.The precise values of x lb and x scale for each variable are summarized in Table S2.Such normalization is known to allow the network to handle variables with different physical dimensions (in this study, temperature, precipitation, pressure, and altitude) in a unified manner and enhance training efficiency.We stress that for precipitation, this normalization does not guarantee that all the resulting values are less than unity since rare events exceed the threshold value of 100 mm/day (the maximum value in the test samples is 912 mm/day).We employed this normalization factor because this choice gave the best performance among the values that we investigated (1000, 100, and 10 [mm/day]).Technically, D KL tends to infinity when P GT (x) = 0 and P SR (x) = 0 holds for a value x (or more technically speaking, for a bin including x), and vice versa.Problematically, this usually happens in real situations because of the limited number of samples: the probabilities of finding rare events are regarded as zero when the number of samples is finite even though they should be small but nonzero in the "true" distribution that is expected to be obtained when the sample number becomes infinite.This results in an undesired infinite value of D KL .To avoid such a trivial artifact, we applied the Gaussian smoothing function f as: where w 2 is the variance of the Gaussian and determines the smoothing width, Z is the normalizing factor that guarantees P (x)dx = 1, ∆ is the bin width, and N is the number of data.P (x) gives the probability of finding a sample in the interval [x − ∆ 2 , x + ∆ 2 ] and x i is the value of the sample i.We fixed this hyperparameter w 2 and ∆ to be w 2 = 16 and ∆ = 2 to obtain the results presented in Fig. 3 and Table 2 in the main text and Table S4.We confirmed that a change in the value of w by a factor of 4 does not change the qualitative results.

E. Locations where the correlation coefficients are evaluated
If we calculate the correlations between all the 400 × 400 grid points, the calculation cost becomes very expensive.Therefore, we extracted only the grid points where the observation stations of the Automated Meteorological Data Acquisition System reside.There are N OS = 630 stations within the system boundary of our study.

F. Topographic information
The precise topographic information about the locations considered in Fig. 3 is summarized in Table S3.The pixels within the region specified by the min/max of the latitude/longitude compose each site.Since, in our case studies, a single pixel in the high-resolution data has a linear dimension of 0.025 degrees in terms of both latitude and longitude, the regions of sites in Table S3 are all composed of 100 pixels.

A. Precise values of statistical indicators
We summarize the precise values of D KL and M SE Jan for each site in Table S4.In this subsection, we present customary statistical measures that are widely employed in the domain of image processing.The first indicator is the mean squared error (MSE) of the pixel-based results MSE pixel defined as: where ∆ k is the mean squared error of the kth sample and N l is the total number of pixels that represent the locations on land.The remaining variables adhere to the definitions introduced in the main text.Following the discussion in the main text, we consider only the precipitation here.Using this pixel-wise measure MSE pixel , the peak signal-to-noise ratio (PSNR), which is usually used as a quantitative measure of the performance of single-image super-resolution tasks, can be defined as: where P stands for the maximum signal intensity.In the case of image processing tasks, P is trivially determined by the possible maximum intensity, e.g., it should be 255 if the pixel is expressed by 8-bit information.However, in the current situation, the peak signal intensity is not known in advance and thus we employed the maximum values observed in the test samples.There are "two different peak signals": the one observed in the downscaling results and the one in the ground truth.In Table S5, we compare the values of PSNR that are measured using the peak signal of the ground truth P GT = 9.12 and the ones in the downscaling results P DS .Despite a very large scale factor of fiftyfold, the results for all methods are remarkably favorable when compared to the benchmarks established in the image processing field.This is simply because the precipitation field exhibits very small values in most sites in most samples compared to the peak signal, and so are the errors.To offer a more intuitive understanding of the difference in the accuracy of each approach (πSRGAN, SRGAN, ψSRGAN, and CDFDM), we have included Fig. S2, which illustrates the difference plots for the single-sample visualization (corresponding to Fig. 2 in the main text), and Fig. S3, which depicts the spatial distribution of the correlation coefficient (Fig. 4 in the main text).These figures substantiate the assertions made in the main text.
In Fig. S2 (A), the disparities between the temperature fields of the ground truth and those generated by the downscaling techniques are illustrated.Interestingly, the magnitude of the error is minimal in CDFDM and maximal in ψSRGAN in accordance with the statistical metrics of precipitation.However, since ψSRGAN incorporates the LR information about the temperature field like other SRGAN-based methods, this inferior performance for the temperature field is unexpected.We need to conduct a comprehensive statistical analysis to draw a definitive conclusion: such analysis falls outside the scope of this paper.
Fig. S2 (B) shows the differences in the precipitation field between the ground truth and the downscaling results.Here, while large errors are relatively widely distributed in the cases of ψSRGAN and CDFDM, only small and localized errors are seen in the cases of πSRGAN and SRGAN.
In Fig. S3, the errors in the spatial distribution of the correlation coefficient (again, defined as the simple root mean squared errors from the ground truth) are displayed.Notably, in the instances of CDFDM, we note extensive regions exhibiting dark colors, which denote significant discrepancies.Specifically, with respect to CDFDM, the areas of dark color align with prominent ridgelines.This is indicative of the spatial layout of the correlations, originating from the topography, being inadequately represented by CDFDM.On the other hand, in the columns of SRGAN-based methods, again even including ψSRGAN, we observe light hues across nearly all locations.This outcome serves to demonstrate the exceptional precision of SRGAN-based methods and, in particular, our approach πSRGAN.

III. GENERALIZATION ABILITY TEST
In this section, we present the outcomes of the generalization capability test of SRGAN.Specifically, we performed downscaling computations on the test samples covering the region spanning from 135.625 to 145.625 degrees east and from 35.625 to 45.625 degrees north, utilizing the Generator network featured in the main text.The network was initially trained on the area illustrated in Fig. 3M in the main text, ranging from 130.625 to 140.625 degrees east and 30.625 to 40.625 degrees north.Thus, the samples employed in this generalization test were shifted by five degrees both in the north and east directions, relative to those utilized in the original analysis.We have calculated the simple probability distribution function of the precipitation P (p) and D KL between P GT (p) and P SRGAN (p) (the information presented in Fig. 3) for 8 representative locations depicted in Fig. S4 (I).Notice that two out of these eight sites are the same ones considered in Fig. 3 in the main text (Niigata and Kanazawa).The results of P (p) are shown in Fig. S4 (A-H) and D KL for each site is shown in Fig. S4(J).We note that, unlike Fig. 3(P) in the main text, we needed to employ the logarithmic scale for the ordinate of Fig. S4(J) because the accuracy becomes worse by orders of magnitude for several sites (e.g., Akita and Kanazawa).These results indicate that the precision of our method deteriorates significantly when we alter the target location for the downscaling (quantitatively speaking, the value of the KL divergence becomes more than ten times higher).Errors are observed not only in the high precipitation regime, where P GT (p) attains very small values, but also in the low precipitation regime, e.g., in Fig. S4(B,E,G,H).The absence of generalization capacity is an anticipated characteristic as our network has been optimized for a specific geographical region.

FIG. 1 .
FIG. 1. Schematic diagram of πSRGAN and the distribution of spatial correlation coefficients.(A) High-resolution topography and low-resolution sea level pressure, in addition to the low resolution data corresponding to the output, are supplied to the generative adversarial networks.(B) The reconstructed distributions of spatial correlation coefficients, indicating the correlation strength from the reference site at Tokyo [35.735 • N, 139.6683 • E], obtained with πSRGAN and a conventional CDFDM are compared to the ground truth (GT).

FIG. 2 .
FIG. 2. Downscaling results.Distributions of temperature (A) and precipitation (B) obtained with SRGAN, ψSRGAN, and CDFDM are compared with the corresponding low-resolution (8 × 8) inputs (LR) and the high-resolution (400 × 400) ground truth (GT).The resolution of the downscaled images is the same as that of the GT.The data from January 24, 2008, are displayed.

FIG. 3 .
FIG. 3. Statistics of precipitation.(A-L)The probability distribution functions (PDFs) P (p) as a function of the precipitation p at each site.12 representative sites are chosen from all over Japan.The different ranges of the abscissa reflect the regional characteristics.See SI Appendix for the technical details in processing the PDFs.(M) The normalized topographic information and the locations of 12 sites of panels (A-L).(N) Bar plot of the values of the Kullback-Leibler divergence of DS methods for P (p).(O,P) The PDFs of the mean µp and standard deviation σp of the precipitation over all the test data at each site; here the values of µp and σp at different sites serve as samples of the PDFs.(Q, R)Bar plot of the values of the Kullback-Leibler divergence for P (µp) and P (σp).

FIG. 4 .
FIG. 4. Spatial distribution of the correlation coefficients for precipitation.The distributions of January obtained with the πSRGAN, SRGAN, ψSRGAN, and CDFDM are compared against the ground truth (GT) in the case of the reference point of correlation at Nagoya [35.1667 • N, 136.965 • E] (A), Niigata [37.9133 • N, 139.0483 • E] (B), and Hiroshima [34.365 • N, 132.4333 • E] (C).The dot color indicates the values of C R Jan (l, l ) between the location of the dots and the reference site.The reference points are represented by star symbols.(D) The mean square error (MSE) of the correlation coefficients of the downscaled precipitations from those of the ground truth.Although in panels (A-C), the longitude and latitude values are omitted for reasons of space, they correspond to the same values as in Figure 3M.

FIG. S4 .
FIG. S4.Results of generalization ability test: statistics of precipitation calculated for a region different from ones used for training.(A-H) The probability distribution functions (PDFs) P (p) as a function of the precipitation p at each site.8 representative sites are chosen from the entire computational domain of the test.The different ranges of the abscissa are employed for different sites.(I) The normalized topographic information and the locations of 8 sites of panels (A-H).(J) Bar plot of the Kullback-Leibler divergence DKL.

TABLE I .
Summary of protocols compared in this study (PRC: precipitation, TMP: temperature, SLP: sea level pressure, TOPO: topography) †: In πSRGAN, TOPO is used as a hint, not targeted

TABLE II .
Year span for each data set

TABLE S1 .
Epoch number and batch size for each training state

TABLE S2 .
The parameters used for the data normalization

TABLE S3 .
Precise information of the location of each site

TABLE S5 .
MSE, PSNR, and PDS of machine learning-based methods