Introduction

The increase of greenhouse gases in the air composition due to human activities is now believed to have led to the rise in the frequency of unusual disasters1,2,3,4. To prevent an irreversible collapse of the current ecosystem and resulting impoverishment of human lives, many countries have set specific medium- and long-term goals for the reduction of greenhouse gas emissions, and similar paradigm shifts in decision making have occurred even at the private sector level.

Numerical approaches are regarded as the most powerful and reliable scientific option at the moment in quantitatively evaluating the efficacy of political or management plans that aim to tackle climatological issues. The Global Climate Model (GCM) is the prime example, which has accurately reproduced past and current climate changes, and its reliability of quantitative future estimates is sufficiently high5. Such future projections with high accuracies rely on the overall consideration of the global atmospheric and oceanic circulation (and even still more complicated ingredients such as chemical6 and biological7 processes)8,9,10,11,12, and thus, the horizontal spatial resolution is sacrificed by the required computational costs; the typical resolution of the GCMs is only down to the order of 1\(^{\circ }\) in longitude and latitude, corresponding to a grid size of more than a hundred kilometers on the equator. Therefore, to exploit the GCM outputs to assess the impact of climate change and to make proper decisions, it is obviously vital to super-resolve the coarse grid spacing of simulations and to reach the fine resolution of interest. Here, special attention should be given to reproducing the inherent spatial correlation of the meteorological variables, as well as the local statistics, in decision making by integrating multiregional information13,14,15,16,17, such as transportation infrastructure development and sustainable energy networks, future urbanization, and agricultural intensification.

Figure 1
figure 1

Schematic diagram of \(\pi\)SRGAN and the distribution of spatial correlation coefficients. (A) High-resolution topography and low-resolution sea level pressure, in addition to the low resolution data corresponding to the output, are supplied to the generative adversarial networks. (B) The reconstructed distributions of spatial correlation coefficients, indicating the correlation strength from the reference site at Tokyo [35.735\(^\circ\) N, 139.6683\(^\circ\) E], obtained with \(\pi\)SRGAN and a conventional CDFDM are compared to the ground truth (GT).

A variety of techniques to super-resolve GCM outcomes, which are referred to as the downscaling (DS) methods in meteorology and climatology, have been developed18,19,20,21,22,23,24,25. They are categorized roughly into two groups: dynamical18,19,20,21 and statistical DS methods22,23,24,25. The dynamical downscaling method is based on physical footings: several coupled differential equations are numerically integrated with the results of the GCM (or any other crude-resolution simulation results) being used as the boundary conditions. However, the computational cost again creates a trade-off between the accuracy and the feasibility. In contrast, in the statistical approaches, we turn a blind eye to the physical laws behind the data. Instead, empirical links between the large- and local-scale climates are identified and applied to the crude-resolution climate model outputs. Since the systematic errors of the naively interpolated GCM output (referred to as the bias) are locally corrected such that the statistical properties are precisely reproduced, the spatial correlation, i.e., the information on the events occurring at distant places, is discarded26,27,28. The statistical downscaling methods overcoming the latter problem remain to be developed.

In this paper, we propose a machine learning approach that super-resolves the GCM outputs and reproduces both the local statistics and the instantaneous spatial correlations between distant regions. Among several options for improving the resolution of geophysical or climatological data29,30,31,32, our method is based on the generative adversarial network (GAN) approach, which has been proven to be a very powerful downscaling tool through several previous studies33,34. To accurately reproduce the physical nature, we use auxiliary but climatologically important data, sea-level pressure distribution and topographic information, in addition to the target variables, temperature and precipitation distributions (see Fig. 1A and the next section for more details). Since this method falls within the criteria of the first-level physics informed super-resolution methods35, we name our method \(\pi\)SRGAN (Physics Informed Super-Resolution Generative Adversarial Network). The direct comparison with the measured meteorological data shows that the local statistical properties are obtained using the practical output from the GCM simulations as accurately as the conventional statistical downscaling method that is focused on matching these properties. We then highlight that the spatial correlation of variables is accurately reproduced, which could not be achieved with conventional downscaling methods (see Fig. 1B). The present method is therefore the next generation downscaling method that has a potential application in climate change assessment considering both local-scale and interregional events. We also considered another variant of the SRGAN that projects the high-resolution temperature and precipitation field from the low-resolution information about only temperature and pressure (we call this variant the \(\psi\)SRGAN: Precipitation-Source-Inaccessible SRGAN). With this special variant, we demonstrate the surprisingly robust ability of the SRGAN-based methods to express natural results.

Table 1 Summary of protocols compared in this study.

Results

Super-Resolution Generative Adversarial Networks with various data

We employ a super-resolution method based on generative adversarial networks (Super-Resolution Generative Adversarial Networks: SRGAN, see Methods section for details) as the basic machine learning algorithm, which was proven to have potential in DS with a scale factor up to 5033. Although the original SRGAN was able to restore physical consistency in the turbulent wind velocity field, which was shown in terms of the well-known Kolmogorov 5/3 power-law39, it was also reported that it showed a worse performance in reproducing the basic statistics, such as the pixelwise consistency like mean squared error, than a less sophisticated deep learning approach33. In this work, considering two distinct variants in addition to the standard SRGAN, we show that the integration of the low-resolution input with auxiliary information enables to overcome the drawback of relatively poor reproducibility of simple statistical properties and that the ability of SRGAN-based methods to downscale in a “physically natural” manner is quite robust against the change in the input low-resolution information.

There are a vast variety of LR information, as seen in several similar recent attempts34,40,41,42. Among them, we employed the sea-level pressure, one of the fundamental hydrodynamic (or aerodynamic) variables on which the various quantities of sub-models of GCMs are based, as a piece of key auxiliary information. Also, this variable is described with fewer assumptions in the models than other meteorological variables such as humidity. In the literature, strong links between synoptic-scale horizontal circulation and vertical motion are discussed in terms of the sea-level pressure22,43,44,45. In the first variant, we incorporate the low-resolution pressure field as an auxiliary physical information (Fig. 1A), which serves as guidance for the DS of the target variables, namely temperature and precipitation. In this method, moreover, we introduced the topographic information as another auxiliary information since it can be utilized in a high-resolution format only if we assume it is identical over the time window of interest (order of 10 to a 100 years). The topographic information is indirectly supplied as a part of teacher data during the training by adding to one of the output channels. In this way, we can provide both low-resolution and high-resolution auxiliary data in an unambiguous manner without any artificial operation (like resolution matching by interpolation or pooling). Since the use of supplemental physical information during learning is regarded as primary-level physics-informed machine learning35, we call this method the Physics-Informed SRGAN (\(\pi\)SRGAN for short).

The second variant of SRGAN is designed to generate high-resolution temperature and precipitation fields using solely low-resolution data pertaining to the temperature and pressure fields. This variant is referred to as the Precipitation-Source-Inaccessible SRGAN (\(\psi\)SRGAN) and demonstrates the surprisingly robust capability of SRGAN-based methods to describe “physically natural” precipitation fields.

The performances of three variants of SRGAN (standard SRGAN, \(\psi\)SRGAN, and \(\pi\)SRGAN) are evaluated via direct comparisons among them and with a non-machine learning-based method: we summarize these methods in Table 1. The cumulative distribution function-based downscaling method (CDFDM) is the widely used conventional statistical DS method (see the method section for the details), and the SRGAN refers to the original SRGAN-based method presented in Ref.33.

Data sets

We use the climate model simulation outputs for the low-resolution input and the real observation data for the high-resolution ground truths in the case studies. As the low-resolution data, we used the Japanese 55-year reanalysis (JRA-55) data from 1980 to 201837 with data assimilation. The grid spacing is \(1.25^{\circ }\). The daily data corresponding to the reference data (in Japanese local time) were created from 3-h simulation data. Specifically, data at 0Z, 3Z, 6Z, 9Z and 12Z on the target date and data at 15Z, 18Z and 21Z on the previous day of the target date were averaged to obtain the daily data in JST. The reference high-resolution data were the Agro-Meteorological Grid Square Data (AMGSD)36. The 1 km-meshed daily data over Japan are constructed using the in-situ observation network system of the Japan Meteorological Agency, which covers the entire land area over Japan from \(122^{\circ }\) to \(146^{\circ }\) east and from \(24^{\circ }\) to \(46^{\circ }\) north. Upon being fed into the networks, all of the data undergo a process of normalization and concatenation. For further information regarding the technical aspects of these procedures, please refer to the SI Appendix.

Table 2 Year span for each data set.

We use the data from 1980 to 2018 (14,245 days in total). These data are split into training, validation, and test datasets in a time-series manner as summarized in Table 2 for both low-resolution (JRA-55) and high-resolution (AMGSD) data. We emphasize that this time-series partitioning, characterized by a substantial volume of test data, represents a challenging task for downscaling mid-term future projections, and consequently, necessitates the incorporation of the climate change trend. The AMGSD data were adjusted such that the grid spacing was \(0.025^{\circ }\)/grid both in latitude and longitude. We extracted the data for the region from \(130.625^{\circ }\) to \(140.625^{\circ }\) east and from \(30.625^{\circ }\) to \(40.625^{\circ }\) north, which results in a \(400\times 400\) pixels square. The JRA-55 data of the corresponding region are \(8\times 8\) pixel squares, and thus, the scale factor for the DS tasks is 50.

Figure 2
figure 2

Downscaling results. Distributions of temperature (A) and precipitation (B) obtained with SRGAN, \(\psi\)SRGAN, and CDFDM are compared with the corresponding low-resolution (\(8\times 8\)) inputs (LR) and the high-resolution (\(400\times 400\)) ground truth (GT). The resolution of the downscaled images is the same as that of the GT. The data from January 24, 2008, are displayed.

Qualitative visualization

We first present typical qualitative visualizations for the temperature and the precipitation fields of one day in Fig. 2, which highlights the ambitious downscaling with the present large scaling factor of 50. Here, the high-resolution information of 2500 pixels is extracted from one single pixel in the low-resolution counterpart. We compare the results of different protocols (summarized in Table 1), along with the visualization of the original low-resolution JRA-55 and the high-resolution AMGSD data.

The difference in the downscaled temperature from the ground truth is not very large (the upper row of Fig. 2), and it is difficult to find any superiority or inferiority in performance from these qualitative plots. In contrast, the results for precipitation demonstrate rich information on the features of DS protocols (the lower row of Fig. 2). The CDFDM result shows an overly smoothed profile compared to the GT: high precipitation values (represented by red colors) are observed in a vaster area. On the other hand, SRGAN family finely reproduce the localized nature of the high precipitation areas, which the CDFDM fails to describe. Remarkably, even \(\psi\)SRGAN also succeeded in reproducing the localized heavy rain event, although, in this method, the low-resolution precipitation field is not supplied as an input. The GAN-based methods13,14,15,16,17 are recognized to be advantageous in reproducing such fine structures. The maximum precipitation values of the DS results are all very close to that of the GT. Please refer to Fig. S2 in the SI Appendix for the graphical depictions of the differences between the GT and DS outcomes, which offer a more direct and intuitive insight into the distinctions among the performances of different methods. We note that although the results for \(\pi\)SRGAN were excluded from Fig. 2 due to their substantial similarity with those for SRGAN and space limitations, they are included in Figure S2 of the SI Appendix.

Single-site statistics

Here and in the following subsections, we discuss the statistical features of downscaling results, focusing on the precipitation p, which is generally considered to be difficult to downscale accurately. In particular, we carefully examine the statistical consistency with the ground truth, which is crucial in actual usage of the DS results, e.g., in impact assessment of climate change in the future. Although the results presented in the main text are climatologically oriented indicators and not standard measures used in the field of image processing, we provide the values of pixel-wise mean squared error and corresponding peak signal-to-noise ratio in the SI Appendix.

Figure 3
figure 3

Statistics of precipitation. (AL) The probability distribution functions (PDFs) P(p) as a function of the precipitation p at each site. 12 representative sites are chosen from all over Japan. The different ranges of the abscissa reflect the regional characteristics. See SI Appendix for the technical details in processing the PDFs. (M) The normalized topographic information and the locations of 12 sites of panels (AL). (N) Bar plot of the values of the Kullback–Leibler divergence of DS methods for P(p). (O,P) The PDFs of the mean \(\mu _p\) and standard deviation \(\sigma _p\) of the precipitation over all the test data at each site; here the values of \(\mu _p\) and \(\sigma _p\) at different sites serve as samples of the PDFs. (Q,R) Bar plot of the values of the Kullback–Leibler divergence for \(P(\mu _p)\) and \(P(\sigma _p)\).

We first measure the probability distribution functions (PDFs) of the precipitation data at 12 representative sites, \(P_\mathcal{S}(p)\). Here, the PDFs are calculated using the set \(\lbrace p_k(l)|l\in \mathcal{S}\ \text{and}\ k\in \mathcal{D}_{\text{test}}\rbrace\), where \(\mathcal{S}\) stands for the site of interest (each site includes 100 grid points: see Table S3 in the SI Appendix), \(\mathcal{D}_{\text{test}}\) is the set of dates that are used for the test data (year span of 2001–2018; Table 2), and \(p_k(l)\) is the value of the precipitation at the pixel l and for the date k (we omit the subscript unless necessary below). The results are shown in Fig. 3A–L. The 12 sites in Fig. 3 are chosen from the seaside areas within the system boundary of this study, as depicted in Fig. 3M. Table S3 in the SI Appendix provides more precise information (latitude, longitude, etc.) about these sites.

Overall, Fig. 3A–L shows that all methods express the regional dependence. Regarding each method, the CDFDM provides results matching the GT very well, including the heavy rainfall regime where \(p>50\)mm per day up to the values at which \(P^{\text{GT}}(p)\) becomes around \(10^{-4}\). This is expected because in the CDFDM the data are processed such that the resulting PDFs become completely consistent with the training data. If we shift our attention to the results of SRGAN family, we first notice that SRGAN and \(\pi\)SRGAN are as accurate as the CDFDM for most sites and most values of p. Moreover, surprisingly, even \(\psi\)SRGAN succeeded in the projection of precipitation in the range \(P^{\text{GT}}(p)>10^{-3}\) at most sites although it was not provided with any direct information about the precipitation. In particular, we would like to stress that an extremely high accuracy has been successfully obtained for Shizuoka (D), a representative site on the Pacific Ocean side (south side), where pressure-dominated summer-type precipitation events occur frequently. This indicates that the pressure field effectively serves as crucial information for the precipitation projection, such as the location of the typhoons. On the other hand, the accuracy is significantly lower at sites on the Sea of Japan side (north side), Akita, Niigata, and Kanazawa (A, C, F), which are less directly affected by typhoons. These trends are interestingly consistent with our knowledge, and it appears as if SRGAN is extracting physical laws from the data and making predictions, just as humans do. Then it is natural that this success of projection of the high-resolution precipitation from the low-resolution pressure drove us to believe the integration of the input information employed in \(\pi\)SRGAN would further improve the downscaling performance of SRGAN. However, since all SRGAN, \(\pi\)SRGAN, and CDFDM offer highly accurate results, it is difficult to visually judge from the graphs which one is better than the others: we make a quantitative comparison in the next paragraph. Before moving forward to the quantitative analysis,we remark on the discrepancies observed for tails in the large precipitation (small probability of \(P^{\text{GT}}(p)<10^{-4}\)) regime even in the cases of the CDFDM. These rare events corresponding to disaster-level torrential rains are very important from the perspective of disaster prevention but are beyond the limit of the current statistical DS methods, on which we provide an overview in Discussion section.

To investigate the difference in the performance of \(\pi\)SRGAN and SRGAN, we quantify the accuracy of each method using the Kullback–Leibler divergence \(D_{\text{KL}}\):

$$\begin{aligned} D_{\text{KL}}(P^{\text{GT}}||P^{DS})\equiv \int dp P^{\text{GT}}(p)\log \frac{P^{\text{GT}}(p)}{P^{DS}(p)}, \end{aligned}$$
(1)

where \(P^{\text{GT}}(p)\) is the PDF of the GT and \(P^{DS}(p)\) is that calculated using the downscaling results (\(DS\in \lbrace \pi\)SRGAN, SRGAN, \(\psi\)SRGAN, CDFDM\(\rbrace\)). Generally, the more different \(P^{\text{GT}}(p)\) and \(P^{DS}(p)\) are, the larger \(D_{\text{KL}}\) becomes; \(D_{\text{KL}}\) vanishes when the two PDFs are exactly identical. Since the difference between two PDFs, \(P^{\text{GT}}(p)\) and \(P^{DS}(p)\), is weighted by the ground truth distribution, the KL divergence places more importance on the frequently occurring events than on rare events. Technical details such as the data preprocessing employed are provided in SI Appendix. The KL divergence between the GT and DS results using distinct methods are shown by bar plots in Fig. 3N and summarized in Table 3, where the values averaged over the 12 sites are presented. The precise values of \(D_{\text{KL}}\) for each single site are provided in Table S4 in the SI Appendix. As expected from the fact that the CDFDM concentrates on matching these statistics for the training data, it gives the best values for most cases. However, it should be noted that, at Hiroshima (denoted by K), \(\pi\)SRGAN marks a better score than CDFDM. This result evidences the remarkable performance of \(\pi\)SRGAN concerning the basic statistical characteristics that the standard SRGAN can handle relatively inadequately. Indeed, among SRGAN family, \(\pi\)SRGAN marks the best performance if we compare them by the average value over 12 sites: \(\bar{D}_{\text{KL}}(P^{\text{GT}}||P^{\pi \text{SRGAN}})\) is smaller than \(\bar{D}_{\text{KL}}(P^{\text{GT}}||P^{\text{SRGAN}})\) by approximately 40% (the bars signify that the presented values represent the mean across 12 sites.). However, \(\pi\)SRGAN is not always better than SRGAN and it shows worse results than SRGAN at Niigata, Kanazawa, and Oita (C, F, L). It is noteworthy that these particular locations are precisely where the performance of \(\psi\)SRGAN is significantly lacking. This observation suggests that the inclusion of low-resolution pressure fields may have led to undesired effects. We also note that, on the other hand, \(\psi\)SRGAN exhibits a lower value of \(D_{\text{KL}}\) than that of the standard SRGAN at Shizuoka (Fig. 3D) where the pressure field is expected to play a crucial role in the determination of rainfall events. These findings about the effects of the introduction of auxiliary fields should be utilized for the future refinement of the method. To give a conclusion for this section, remarkably, even the standard SRGAN shows the same order of values of \(D_{\text{KL}}\) as those of CDFDM. Moreover, the provision of climatologically important auxiliary information can further improve the precision by \(40\%\), evidenced by the results of \(\pi\)SRGAN.

Table 3 Average KL divergence of PDFs.

Statistics over all sites

As another meteorologically important statistical point of view, we further measure the statistics over all sites: the PDFs of the mean \(\mu _p\) and the standard deviation \(\sigma _p\) of the precipitation calculated over all test data on each pixel l:

$$\begin{aligned} \mu _p(l)&\equiv \frac{1}{N_{\text{test}}}\sum _k^{N_{\text{test}}}p_k(l),\end{aligned}$$
(2)
$$\begin{aligned} \sigma _p(l)&\equiv \sqrt{\frac{1}{N_{\text{test}}}\sum _k^{N_{\text{test}}}\left( p_k(l)-\mu _p(l)\right) ^2}, \end{aligned}$$
(3)

where \(k\in \mathscr{D}_{\text{test}}\) is again the sample index, and \(N_{\text{test}}\) is the number of samples in \(\mathscr{D}_{\text{test}}\). The probability distribution of \(\mu _p\) and \(\sigma _p\), denoted by \(P(\mu _p)\) and \(P(\sigma _p)\), are shown in Fig. 3O,P. Note that here the values calculated on each pixel serve as samples for these PDFs. The corresponding KL divergence \(D_{\text{KL}}\) for \(P(\mu _p)\) and \(P(\sigma _p)\) are presented in Fig. 3Q,R as well.

Remarkably, regarding the statistics of pixelwise average over all dates in the test dataset \(P(\mu _p)\), \(\pi\)SRGAN (and moreover, SRGAN as well) achieves a better score than CDFDM. However, on the other hand, regarding \(P(\sigma _p)\), CDFDM is the best and it shows almost identical results as GT. The small shifts of the whole curve of \(P(\sigma _p)\) to the left of SRGAN-based methods are consequences of the underestimation of the high-precipitation events shown in Fig. 3A–L. These results suggest that SRGAN-based methods exhibit a bias towards typical values in downscaling results, as opposed to presenting bold projections of extreme events, compared to CDFDM. This is actually an anticipated tendency considering the design of the standard training scheme employed in machine learning-based methods.

Spatial correlation

Next, we examine in detail the spatial correlation of the downscaled results. The importance of the spatial correlation of the meteorological variables, i.e., the relation between two distant sites, has been realized very recently13,14,15,16,17, e.g., in the context of impact assessment of climate change. However, conventional DS methods such as CDFDM have proven to overestimate the correlation even though the statistical consistency with the GT is maintained26,27,28. Such a tendency is actually seen in the qualitative visualizations in Fig. 2, where the overly smoothed profiles are obtained. We thus systematically evaluate the accuracy in expressing the spatial correlation of the precipitation by measuring the Pearson’s correlation coefficients of the precipitation \(C_M^R(l,l^\prime )\) between two sites, l and \(l^\prime\), which is defined as:

$$\begin{aligned} C^R_{M}(l,l^\prime )=\left\langle \frac{\frac{1}{N_M}\sum _k^{N_M} (\delta p^R_k(l)\delta p^R_k(l^\prime ))}{\sqrt{\frac{1}{N_M}\sum _k^{N_M} (\delta p^R_k(l) )^2}\sqrt{\frac{1}{N_M}\sum _k^{N_M} (\delta p^R_k(l^\prime ))^2}}\right\rangle _M \end{aligned}$$
(4)

where \(\delta p_k^{R}(l)\equiv p_k^{R}(l)-\bar{p}_M^{R}(l)\) is the deviation of the k-th sample at site l from its reference average value \(\bar{p}_M^R(l)\). The subscript M indicates that the average is taken over the data of month M, the superscript \(R\in \lbrace \text{GT},\pi \text{SRGAN},\text{SRGAN},\psi \text{SRGAN},\text{CDFDM}\rbrace\) distinguishes the datasets and \(N_M\) represents the total number of test data samples belonging to month M. Since the distribution of the correlation coefficients is known to have features specific to each month, we measure the monthly values of the coefficients. Below, we focus on the results for \(M=\text{January}\), for which a previous work has pointed out the existence of a distinguished spatial pattern of precipitation correlation28.

Figure 4
figure 4

Spatial distribution of the correlation coefficients for precipitation. The distributions of January obtained with the \(\pi\)SRGAN, SRGAN, \(\psi\)SRGAN, and CDFDM are compared against the ground truth (GT) in the case of the reference point of correlation at Nagoya [35.1667\(^\circ\) N, 136.965\(^\circ\) E] (A), Niigata [37.9133\(^\circ\) N, 139.0483\(^\circ\) E] (B), and Hiroshima [34.365\(^\circ\) N, 132.4333\(^\circ\) E] (C). The dot color indicates the values of \(C_{\text{Jan}}^R(l,l^\prime )\) between the location of the dots and the reference site. The reference points are represented by star symbols. (D) The mean square error (MSE) of the correlation coefficients of the downscaled precipitations from those of the ground truth. Although in panels (AC), the longitude and latitude values are omitted for reasons of space, they correspond to the same values as in Fig. 3M.

Figure 4A–C show the spatial distribution of the correlation coefficients \(C_{\text{Jan}}^{R}(l,l^\prime )\), with Nagoya, Niigata, and Hiroshima being the reference points l (the locations of the reference points are marked by the star symbols). The correlations measured for the CDFDM are too high compared to the GT at almost all sites, as shown in Fig. 4A. This is mainly because the 2500 grid points extracted from the corresponding single low-resolution pixel tend to have similar values. In contrast, the results of the SRGAN family exhibit much sharper spatial contrast, e.g., the contrast between the north and south sides of the Chugoku area (around [36\(^\circ\) N, 135\(^\circ\) E]) is well captured. The differences in performance among these SRGAN-based methods are very subtle and a precise quantification is necessary to rank them: we will get back to this issue in the next paragraph. In Fig. 4B,C, we qualitatively observe the same difference in the accuracy among the methods. In particular, the SRGAN family, even including \(\psi\)SRGAN, successfully reproduce the nonmonotonic nature of the correlation as a function of the distance from the reference site: e.g., in the results of the GT and SRGAN-based methods in Fig. 4B, along the north side coastline (see the arrow in the figure), the correlation decays quickly near the reference point and then grows again around the Noto peninsula (around [38\(^\circ\) N, 137.5\(^\circ\) E]). The CDFDM, on the other hand, merely exhibits the monotonic decay of the correlation along the same line. Please see also the SI Appendix for the difference plots between the GT and DS results.

Table 4 Average MSE of the correlation coefficients.

To quantify the accuracy of \(C_{\text{Jan}}^{DS}(l,l^\prime )\) for the different methods, we measure the mean square error (MSE) of the spatial distribution of the correlation coefficient defined as:

$$\begin{aligned} MSE_M^{DS}(l)=\frac{1}{N_{\text{OS}}}\sum _{l^\prime }^{N_{\text{OS}}}(C_M^{DS}(l,l^\prime )-C_M^{\text{GT}}(l,l^\prime ))^2, \end{aligned}$$
(5)

where l is the reference site and \(N_{OS}\equiv 630\) is the number of observation stations (see SI Appendix for a detailed explanation). The values of \(MSE_\mathrm{Jan.}^{DS}\) measured based on each reference site are compared in Fig. 4D, and the average values are listed in Table 4 (the values for each site are shown in Table S4 in the SI Appendix). All SRGAN family exhibit much better results than those of the CDFDM for all sites considered here and even \(\psi\)SRGAN offers twice better results. Specifically, the best one, \(\pi\)SRGAN, achieves 3.6 times better accuracy than the CDFDM for the average value over 12 sites. This result of the SRGAN-based methods being advantageous in achieving the “naturalness” of the spatial pattern is consistent with the report in Ref.33. If we further compare the results of SRGAN-based methods, although \(\pi\)SRGAN offers the best performance in terms of the mean value over all 12 sites, the standard SRGAN has the best values at the majority of locations, albeit by only small margins as shown in Fig. 4D (and Table S4 in the SI Appendix). We interpret this result as meaning that both \(\pi\)SRGAN and SRGAN demonstrate comparable performance in relation to the statistical characteristics of spatial correlation. Together with the discussion in the previous subsections, the results presented in this section enable us to conclude that in the present \(\pi\)SRGAN, the auxiliary fields enhance the reproducibility of the simple statistics (such as P(p)) while maintaining the expression ability of the natural spatial expanse. Such a strong downscaling ability highlights the applicability to local-scale and interregional assessments of climate change.

Discussion

We have developed a machine learning-based statistical downscaling (DS) method with a large scale-factor of 50, while maintaining both the basic statistical properties and the spatial correlation. We employed a physics-informed type approach35 on the basis of the SRGAN-based method, and specifically, we developed a framework to use the proper auxiliary physical information along with the low-resolution input to attain large improvements in the DS performance as summarized in Fig. 1 and Tables 3, 4. High accuracy comparable to the CDFDM, a conventional method in actual use, was demonstrated by directly comparing the climatological statistical properties with the real data. More importantly, our approach exhibited the highly accurate reconstruction given in Fig. 4 of the natural spatial distribution of the precipitation correlation coefficient, which was a serious issue for the conventional statistical DS methods, including CDFDM26,27,28. Since the importance of the multiregional spatial correlation has recently been recognized13,14,15,16,17, the present method is a promising new-generation alternative to conventional statistical DS methods, particularly in situations where the integration of the multiregional information is necessary.

The detection and prediction of rare events are vital issues inter alia in the context of climate change assessments. The methods including the present \(\pi\)SRGAN indeed have yet to accurately capture the low probability but significant rainfalls, as shown in Fig. 3. Here, we discuss possible directions to ameliorate the problem. First, we could raise the level of physics-informed machine learning in terms of the classification proposed in Ref.35. If we succeeded in directly incorporating some part of the governing equations into the learning process while maintaining the computational efficiency, local phenomena such as heavy rains would be predicted with high reliability. Another direction is to take measures to reform the basic machine learning architecture itself. Following the GAN-based approach, flow-based and diffusion model-based methods have attracted public attentions as powerful next-generation tools for general super-resolution tasks46,47. The main feature of these approaches is to generate multiple image candidates from a single input. Therefore, probabilistic information is expected to be drawn from the multiple super-resolved images, which would enable us to tackle the rare event predictions.

Another perspective concerns the use of machine learning techniques to improve the efficiency of dynamical downscaling, i.e., developing a high-speed machine-learning-based solver for the governing equations of climate models. Here we refer to an example of a speed up of multiscale simulations; in Ref.48 the Gaussian process is used to reduce the computational burden of multiscale simulation for polymeric liquid to achieve a reduction by a factor of 30-100 without loss of accuracy. Breakthroughs driven by similar approaches are expected once the complexity of the governing equations for the climate models is overcome.

Finally, we refer to the generalization ability of SRGAN. Here, we have selected SRGAN instead of \(\pi\)SRGAN due to the anticipated lack of high generalization ability of the latter (\(\pi\)SRGAN relies on topographic information that is specific to the training area). In the SI Appendix, we present the results of the generalization test, in which we tried to execute downscaling computations for samples derived from a different area than the one employed for training. Specifically, the test area encompasses the region spanning from 135.625\(^{\circ }\) to 145.625\(^{\circ }\) east and from 35.625\(^{\circ }\) to 45.625\(^{\circ }\) north, with a 5\(^{\circ }\) shift in both the eastward and northward directions from the original region used for the training. The findings of the examination demonstrate considerably inferior performance compared to those reported in the main text, exposing the deficient generalization capability. This suboptimal performance of the generalization ability is a somewhat predictable attribute since the training data are all from a specific same region. Even though we did not explicitly provide information about the topography in SRGAN, it is plausible that the network learned it indirectly through the temperature field, which exhibits a strong correlation with topography. We stress that we observe large errors even for Niigata and Kanazawa, which were part of the original computational domain. To enhance the generalization ability, we would need to incorporate samples from a more extensive range of areas. The exploration of such an approach is left for future research.

Methods

CDFDM

Among a variety of statistical methods, we use, as a reference, the cumulative distribution function-based downscaling method (CDFDM) with quantile mapping that is in actual use.

If we simply map the low-resolution GCM simulation results onto the point at which the observations are available, we generally see a systematic difference, defined as bias, which comes from the systematic error of the model prediction and/or from the interpolation error. Removing this inherent bias is especially important in applying the downscaling results to the impact assessments. In the CDFDM, bias is corrected via an empirical transfer function constructed in advance using measured data of distributional variables and the corresponding simulation results. The detailed procedure of constructing the transfer function is described as follows38.

The crude low-resolution data obtained from the GCM are first mapped onto a 2 km mesh using simple bilinear interpolation. At each mesh point, an empirical cumulative distribution function (CDF) is then constructed using the interpolated data of the variable of interest over a specified time window. The transfer function is defined as a map of a variable onto the one at which the corresponding CDF of the observation falls within the same quantile level. This preconstructed transfer function is applied under the assumption that the error-percentile relation is conserved over time. In the present study, the time window of a month is employed, while the original time window is over a half-year38, to more sensitively capture the seasonal trend49,50.

Note that while this CDFDM is a nonparametric method, the corrected CDF perfectly matches the corresponding CDF of the observation (for the training data); the statistical properties of the downscaling results are expected to reproduce the observation well. The bias-corrected climate scenario obtained with this method has been widely used in climate change impact studies49,51,52.

SRGAN

We employ a generative adversarial networks-based (GAN-based) method as the basic machine learning architecture, which is called Super-Resolution Generative Adversarial networks (SRGAN)53. The terminology super-resolution (SR; or, in particular, single-image super-resolution) refers to a method of restoring a high-resolution image from the corresponding low-resolution data and is the counterpart of the downscaling in the realm of the general image processing. The GAN-based methods are capable of generating realistic images by pitting a discriminator network against a generator network that generates samples (see Fig. 1A). The discriminator network takes the real data (ground truths) and the fake data (output of the generator network) as inputs and identifies the authenticity of the input samples. The generator network tries to deceive the discriminator while the discriminator tries to judge with high accuracy. As a result, both networks spontaneously learn the “realistic” information. The SRGAN can reproduce fine textures that cannot be achieved by normal convolutional neural network-based variants and offers substantially improved realistic super-resolution images.

Such network-based super-resolution techniques have recently been used for the DS tasks of climatological data. In a representative report by Stengel and coworkers, Ref.33, the authors compared the performances of SRGAN-based downscaling methods with previous methods (SRCNN: Super-Resolution Convolutional Neural-Networks). Although the SRCNN-based method appeared to be superior in evaluating the performance in terms of the simple pixelwise MSE, the SRGAN-based method provided realistic results satisfying the important physical requirements, e.g., the energy spectrum of the wind velocity field satisfied the Kolmogorov 5/3 scaling law39 with remarkable accuracy. The network architecture in our \(\pi\)SRGAN is mostly the same as the original SRGAN introduced in Ref.53, although the batch normalization layers are removed obeying Ref.33: the explanation of the precise architecture is presented in SI Appendix. We also summarize other technical details, such as the precise learning protocol, hyperparameter tuning, and the normalization of the data there. We note that the representative method compared to the \(\pi\)SRGAN referred to as “SRGAN” in our implementation is a slightly upgraded version including the high-resolution topography, which makes possible the decomposition of elements producing the improvement.