Evaluation of Reanalysis Surface Incident Solar Radiation Data in China

Surface incident solar radiation (Rs) of reanalysis products is widely used in ecological conservation, agricultural production, civil engineering and various solar energy applications. It is of great importance to have a good knowledge of the uncertainty of reanalysis Rs products. In this study, we evaluated the Rs estimates from two representative global reanalysis (ERA-Interim and MERRA-2) using quality- controlled surface measurements from China Meteorological Administration (CMA) and Multi-layer Simulation and Data Assimilation Center of the Tibetan Plateau (DAM) from 2000 to 2009. Error causes are further analyzed in combination radiation products from the Earth’s Radiant Energy System (CERES) EBAF through time series estimation, hotspot selection and Geodetector methods. Both the ERA-Interim and MERRA-2 products overestimate the Rs in China, and the MERRA-2 overestimation is more pronounced. The errors of the ERA-Interim are greater in spring and winter, while that of the MERRA-2 are almost the same in all seasons. As more quality-controlled measurements were used for validation, the conclusions seem more reliable, thereby providing scientific reference for rational use of these datasets. It was also found that the main causes of errors are the cloud coverage in the southeast coastal area, aerosol optical depth (AOD) and water vapor content in the Sichuan Basin, and cloud coverage and AOD in the northeast and middle east of China.

Surface incident solar radiation (R s ) is the basic energy of biological, physical and chemical processes, and the essential input parameters of biological physics models and hydrological simulation mathematical models 1,2 . Ground-based stations provide the best estimate of R s , but it is still insufficient for estimation in remote areas, especially in high latitudes, and in plateaus or mountainous areas, due to the sparsity and heterogeneity of stations [3][4][5][6] . Currently, there exists a range of gridded global R s products with higher spatial resolution exist from remote sensing 7,8 and reanalysis 9,10 . Satellite remote sensing is one of the most practical ways to derive R s with relatively higher accuracy, but temporal coverage is limited by transit time of satellite 11,12 .
In addition to satellite-based products, scientists have been developing reanalysis R s products to restore long-term historical climate records for numerical weather forecasting using data assimilation techniques since the late 1980s 13 . Reanalysis methods provide dynamically consistent global analysis of the global atmospheric characteristics by combining the geophysical fluid-dynamic model of the atmosphere and measurements. The model contains important physical processes, such as radiative transfer and convection. Observations are used to constrain the dynamic model to optimize the properties of complete coverage and accuracy 14 . Reanalysis data provides global and effective R s of long time series, which alleviate the deficiency in radiation data and greatly promote the development of modern atmospheric science. As reanalysis data are obtained by numerical simulation, they cannot completely replace the observed data for describing the real three-dimensional state of the atmosphere 15 . Due to the heterogeneity of various data sources and difference of data assimilation schemes, there exist errors in radiation reanalysis products 16 . Thus, understanding the uncertainty and deviation of reanalysis data is a prerequisite for the rational use of reanalysis data 17 .
Since the mid-1990s, the United States, European Union, and Japan have organized and implemented a series of global reanalysis projects on atmospheric data to restore and reconstruct historical records of climate change. There are six representative reanalysis products: ERA-Interim of the European Centre for Medium-Range of MERRA-2 and ERA-Interim. Meanwhile, the MERRA-2 and ERA-Interim products are compared to determine the difference in their spatial distributions and seasonal variations. The Clouds and Earth's Radiant Energy Systems (CERES) Energy Balanced and Filled (EBAF) R s dataset 8 is used to identify regions with large radiation deviations in different seasons as hotspots through comparison to reanalysis radiation data. Considering the influence of atmospheric factors on the reanalysis R s products and the spatial heterogeneity of the distribution of atmospheric factors, we introduce the Geodetector 32 to quantitatively analyze the causes of the spatial-temporal errors of R s in the hotspots and utilize CERES-EBAF atmospheric products to verify the results of the dominant atmospheric influence factors. The results are useful for the proper application and accurate data correction about the two representative global reanalysis data.
This paper is organized as follows. The reanalysis products (ERA-Interim and MERRA-2) and station measurements (CMA and DAM stations) used for this study region, the description of the quality control of the stations and the rationale of using the Geodetector are detailed in Section 2. The errors of reanalysis are validated, and the domain impact factor based on the Geodetector is analyzed in Section 3. A short summary and conclusions are presented in Section 4.

Experiments
Data. Five R s data sources are used in this study: two reanalysis products, one satellite remote sensing based product, one ground measurement and one model simulation dataset.
Reanalysis products. Two reanalysis products, ERA-Interim and MERRA-2, are used in this study. These products are different in many aspects, such as physical parameterizations of numerical models, numerical schemes, observational data used for assimilation, and the assimilation schemes 17,24 . The monthly mean R s data of these two reanalysis datasets from 2000 to 2009 are evaluated in this research. The details of these datasets are described in the following paragraphs.
The ERA-Interim 9 is provided by the European Centre for Medium Range Weather Forecasts (ECMWF), which is one of the most important reanalysis centers in the world. The spatial resolution of the ERA-Interim is 0.75° (approximately 80 km) in the horizontal direction, which is divided into 60 levels from the ground surface to the 0.1hPa altitude in the vertical direction. The temporal resolution of the ERA-Interim is 3 h. An improved 3DVar assimilation technology is used as the assimilation scheme. The data sources used in the assimilation scheme include ground measurements and satellite remote sensing data. The radiation scheme is based on the Rapid Radiation Transfer Model (RRTM) 33 . The prognostic cloud variables (cloud cover, cloud condensed water) and water vapor from the meteorological model and climatologic values for aerosols, carbon dioxide, trace gases and ozone are used in the radiative transfer model 30,34 .
The MERRA-2 10 is provided by the GSFC's Global Modeling and Assimilation Office (GMAO). It was introduced to replace the original MERRA dataset because of the advances made in the assimilation system that enable assimilation of modern hyperspectral radiance and microwave observations, along with GPS-Radio Occultation datasets. It also uses NASA's ozone profile observations that began in late 2004. Additional advances in both the GEOS (Goddard Earth Observing System Data Assimilation System) model and the GSI (Gridpoint Statistical Interpolation) assimilation system are included in MERRA-2. Spatial resolution remains about 50 km in the latitudinal direction. Along with the enhancements in the meteorological assimilation, MERRA-2 takes some significant steps towards GMAO's target of an Earth System reanalysis. The radiation scheme is based on the method proposed by Chou and Suarez 35 . In the MERRA-2 reanalysis, the meteorological and aerosol observations are simultaneously assimilated within the GEOS-5. The MODIS Neural Net Retrieval and AVHRR Neural Net Retrieval at 550 nm are used to assimilate the AOD observations 30 .
CMA ground measurements. The CMA provides ground radiation measurements of 122 stations in China (except Taiwan, Hong Kong, and Macao). The CMA surface observations include three basic physical variables: the total surface solar radiation R s , direct solar radiation R dir , and diffuse radiation R dif from 1957 to the present. The instruments of stations used are: DFY-4 and TBQ-2 total radiation meter; DFY-3 and TBS-2 direct radiation meter (both with solar tracking frame); DFP-1 shading ring; and RYJ-4 automatic radiation recorder. The above-mentioned radiation meters are all electrothermal type, which consists of two parts: the induction surface and the thermopile 36,37 . The radiation meter and measuring instrument (voltmeter, ammeter) constitute a set of radiation instruments. Detail information of instruments for CMA stations are available on the website: http:// data.cma.cn/site/index.html. The world radiation reference (WRR) is used as the standard to carry out the value transmission in these stations, ensuring the comparability of solar radiation measurements worldwide. The distribution of these radiation stations is shown in Fig. 1 (red triangles). Total  1  2  3  4  5  6  7  8  9  10  11  12   Upper-limit error days  75  60  216  307  477  330  276  171  123  138  100  55  2328   Lower-limit error days  695  516  397  305  323  298  201  259  364  436  510  539   CERES-EBAF product. The CERES-EBAF product is derived from the clouds, AOD, and earth radiant energy systems detected by the TRMM, Terra and Aqua satellites. It provides a reasonable inversion of atmospheric parameters such as clouds, water vapor and AOD and R s estimation. Atmospheric parameters are derived from A-train Constellation (the Cloud Aerosol Lidar with Orthogonal Polarization (CALIOP) on the Cloud Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite, the CloudSat Cloud Profiling Radar (CPR), and the Aqua Moderate Resolution Imaging Spectrometer (MODIS)). R s are derived from a radiative transfer model of CERES with k-distribution and correlated-k for radiation (FLCKKR) with a two stream approximation, which is consistent with the radiative flux from the surface to the top of the atmosphere 41 . The CERES-EBAF product contains 1° regional, zonal and global monthly means of Top-of-Atmosphere (TOA) and surface (SFC) longwave (LW), shortwave (SW), and net (NET) fluxes under clear and all-sky conditions. The CERES-EBAF has higher accuracy than the other grid R s products like CMIP5, NCEP-NCAR, NCEP-DOE, CFSR, ERA-Interim, MERRA and JRA-55 et al. 17,41,42 . Zhang et al. 17 found that the CERES-EBAF R s data are more consistent with ground measurements than the reanalysis data, and provide more accurate atmospheric products such as clouds and aerosols. Li et al. 43   The results show that the water vapor has high correlation coefficient (more than 0.91) with station measurements, yielding a mean bias and RMSE less than 2.58 mm and 6.02 mm respectively. The EBAF R s , AOD, Cloud Coverage and Water Vapor Content of CERES are used in this study. Due to the lack of higher spatiotemporal resolution ground measurements, in this paper, the CERES data were averaged in a long time series for subsequent analysis and comparisons, which greatly reduced the uncertainty of the data and enhanced the credibility of the experiment.

Methodology. Station quality control.
There are some quality problems in the CMA ground measurements, resulting from the accuracy and calibration of radiation instruments, the human factors in the operation of instruments, and the positional changes in stations 46 . When using CMA radiation data as validation data, a quality inspection should be carried out first. The original station data have been checked by a simple physical quality control of the daily R s data; that is, the R s is equal to the sum of diffuse radiation and the direct radiation (Original threshold control: ). Furthermore, an upper limit is set for further validation. Referring to Shi's research 37 , the upper limit is the solar radiation G 0 received by the earth's top atmosphere per day: cos cos sin 180 sin sin (A1) where I 0 is the solar constant (approximately 1367 W/m 2 ), δ is the solar declination angle, ϕ is the station latitude, w s is the sunset angle, and k is the earth orbit eccentricity correction factor. The daily earth orbit eccentricity correction factor k is calculated using the following formula 47 : where k is the ordinal day of the year and d n represents the day of the year. The daily solar declination angle δ and sunset angle w s are calculated using the following formulas 48,49 : From a physical point of view, it is difficult to set a strict lower limit for the R s . Due to the existence of scattered radiation; the R s must be greater than zero. Geiger et al. 50 based on the statistical characteristics of the radiation data, the lower limit is set to 3% of the daily solar radiation received by the top of the earth's atmosphere every day (see Eq. (A5)). Such simple quality control is proved to be effective can eliminate most of the error [51][52][53][54] . Moreover, CMA data is checked for errors at daily scale. The final used data is the monthly average data calculated from daily R s . Table 1 summarizes the results of the physical threshold test. From the table, we can see that the incidence of www.nature.com/scientificreports www.nature.com/scientificreports/ errors is lower in summer. The average error rate of the daily R s observations from 2000 to 2009 is 1.98%, and the winter error rate is higher than that of the summer. In summary, only ground measurements that pass the following physical criterion are used for validation.
Considering that the CMA stations are sparse and unevenly distributed, the 716 R s data of the DAM is also used in this study. The daily surface solar radiation dataset was produced by merging two data sets. One is the hybrid model estimate at 716 CMA stations and the other is the ANN-based model estimate at 96 radiation stations. The latter, which has higher accuracy, was used to correct the hybrid model estimate dynamically at a monthly scale. Tang et al. 39,40 verified the accuracy of the assimilated radiation data set And showed that the bias and RMSE of the hybrid model at the 96 CMA radiation stations are 0.7 and 2 MJ/m 2 respectively, While that of the corrected hybrid model are −0.1 and 1.8 MJ/m 2 respectively. The accuracy of the corrected radiation dataset is significantly higher than that of the traditional locally calibrated model. Therefore, the 716 R s data can be used as valid data for evaluating the reanalysis data. In this study, both CMA and DAM are used to evaluate the reanalysis products for a better comparison. But only DAM is used to analyze the error causes of reanalysis because it is highly consistent with CMA and has more data than CMA.
Time series estimation. In this study, the R s data from 2000 to 2009 by ERA-Interim and MERRA-2 are compared with the CMA and DAM measurements, and the data error of reanalysis at different time scales is analyzed. The daily R s estimates of ERA-Interim and MERRA-2 are horizontally interpolated using a bilinear interpolation technique with inverse distance weights for four most closest surrounding grid cells 30 . www.nature.com/scientificreports www.nature.com/scientificreports/ To investigate the error distribution of the reanalysis data at the different stations, the seasonal distribution of the average relative bias (RB) for the reanalysis and measured R s from 2000 to 2009 is calculated. The expression of the RB is as follows: where R _ s Reanalysis and R _ s sites represent the R s of reanalysis products and stations, respectively. Where RB indicates relative bias, and positive values, indicate that the reanalysis data is higher than the ground observation data, while negative values indicate that the reanalysis data is lower than the ground observation data.
The expression of the monthly anomalies is as follows:  www.nature.com/scientificreports www.nature.com/scientificreports/ where SSR i and SSR are the average monthly R s and the average monthly R s for all years of that month, respectively.
The expression of the RMSE is as follows: Geodetector. The Geodetector is a method used to explore the influence mechanism of geographical spatial zoning factors on disease risk in infancy 55 . It can effectively identify the relationship between multiple factors and geographical phenomena, so it has been gradually applied to the study of geography and humanities 56,57 . The factor detector of the Geodetector can verify the spatial heterogeneity of a single variable. Atmospheric factors such as AOD, cloud coverage and water vapor content are typical category variables and have an important influence on the reanalysis of R s . Therefore, it is suitable to use the Geodetector method to reveal the influence of the regional atmospheric factors on the reanalysis of the surface radiative error. The Geodetector software is divided into four parts: the risk detector (superposition of the related data, then comparison of whether the difference is significant, and the major role of the significant factor in the risk is determined), the factor detector (using the q value 55 to test the association between two variables Y and X, according to the coupling between their spatial distributions, without assumption of linearity), the ecological detector (using the variance to compare), and the interaction detectors (including synergy, antagonism, double synergy, single antagonism and mutual independence). The factor detector section is used in this study.
To analyze the significance of the atmospheric influence factors on the R s errors in the different regions of China, we actually analyze 10 factors at first, including the reanalysis ozone concentration, surface albedo, content of ice and water clouds, cloud coverage, AOD and water vapor content, etc. It is found that the influence of AOD, cloud coverage and water vapor content on the R s is obvious, while other factors are not significant. This is basically consistent with the results of sensitivity analysis of radiation transmission models 47 . Therefore, this paper focuses on the quantitative influence level of AOD, cloud coverage and water vapor content on the R s error in the different regions. Taking the RB of R s at each regional station as the spatial stratification variable Y, the influence of atmospheric factors on the RB of reanalysis R s is measured based on the power of determinant (PD) value of the Geodetector model. The region used to analyze the cause of error must satisfy the following conditions: the station distribution in the region is even and as many stations as possible, and error analysis is conducted on the region where the reanalysis R s deviation is large. These regions are called hotspots in this study. Studies 3,8,17,42 have shown that CERES-EBAF has a higher accuracy than the other grid R s products; thus, the CERES-EBAF R s data are used as verification data. Zhang et al. 17 proved that almost all the reanalysis R s products showed www.nature.com/scientificreports www.nature.com/scientificreports/ better accuracy in summer (June, July, and August) than in winter (December, January, and February), and the difference between summer and winter is more prominent than other seasons. Section 3.1 of this study further confirmed that both ERA-Interim and MERRA-2 have lager frequency distribution with smaller RB in summer than winter (Fig. 2). So, the reanalysis monthly mean R s products were divided into summer and winter seasons to assess the seasonal error causes (Fig. 3). To perform the comparison, a bilinear interpolation of a weighted average of pixel in the nearest 2-by-2 neighborhood was used to unify the resolution of the reanalysis products to CERES-EBAF. According to the map, this study selects the large deviation regions of the reanalysis R s in summer and winter as a hotspot area. Two hotspots in each season are selected, and a total of 8 hotspots are numbered 1-8. For the ERA-Interim, the southwestern region (hotspot 1) and the eastern region (hotspot 2) are selected as the hotspots during the summer, while the southern coastal region (hotspot 3) and the northeastern region (hotspot 4) are selected in winter. For the MERRA-2, the southern coastal region (hotspot 5) and the Sichuan Basin region (hotspot 6) are selected in summer, and the southern coastal region (hotspot 7) and the north-eastern region (hotspot 8) are selected in winter. The same region but different hotspot numbers was defined when the seasons or reanalysis product are different for better comparison, such as the hotspots 3, 5 and 7, and the hotspots 4 and 8.
Based on the reanalysis seasonal differences, we use Geodetector to analyze the causes of the errors in the hotspots. As shown in Fig. 4, K-means clustering analysis is first used to divide the hotspot into k sub-regions with atmospheric factor X. The number of R s stations and the variance of the RB of the R s of the stations in the sub-regions are recorded as  www.nature.com/scientificreports www.nature.com/scientificreports/ RB of the R s over the whole hotspot's station. The study area is divided into k sub-regions based on atmospheric factors. When the atmospheric factor X has a decisive force on the RB of the station, the gap between the atmospheric factors in each sub-region is small while the gap between the sub-regions is large. The discrete variance σ di 2 of the station is small in each sub-region, while the variance between the sub-regions is very large. When σ di 2 is close to 0, the PD value tends to be 1, which is the ideal state indicating that the RB changes are completely determined by atmospheric factor X. If the RB of the R s is irrelevant to the atmospheric factor, then the weighted sum of the discrete variance σ di 2 and the number of stations in each sub-region N di is closer to 2 , thus PD = 0. Therefore, the range of the PD value is [0, 1]. A larger PD value indicates a greater correlation between the reanalysis atmospheric factors and RB of the reanalysis R s .

Results and Discussion
Correlation and trend analysis.  www.nature.com/scientificreports www.nature.com/scientificreports/ data. In addition, the experiment shows that the reanalysis of the R s data by ERA-Interim and MERRA-2 is higher than the ground observation station data, which is consistent with the previous conclusions 17, 58,59 21 in spring and winter, respectively, which is higher than that in summer and autumn season. The seasonal characteristics of the MERRA-2 are not obvious, and the range of bias in summer and autumn is 29.55-45.62 W/m 2 , which close to that in spring and winter. However, the range of the RMSE in spring and winter is 35.59-53.09 W/m 2 , which is slightly higher than that in summer and autumn.
As shown in Figs. 2 and 8, the monthly mean R s of the four datasets peaks around July, reaching a valley value around January every year. The monthly mean R s of the datasets in annual cycle is shown in Fig. 8(a). The CMA and DAM datasets are basically fitted with a range from 80 to 230 W/m 2 . The R s ranges of the ERA-Interim and MERRA-2 are 97-249 W/m 2 and 105-280 W/m 2 , respectively. The MERRA-2 is higher than the ERA-Interim and surface measurements. The monthly mean anomalies R s of the four datasets in annual cycle is shown in Fig. 8(b). The CMA and DAM datasets are basically fitted with a range from −11 to 12 W/m 2 . The R s anomaly ranges of the ERA-Interim and MERRA-2 are −16-13 W/m 2 and −13-11 W/m 2 , respectively. It can be seen that larger values of the R s relative anomalies are more frequent and of larger magnitude in the MERRA-2/CMA-DAM than the www.nature.com/scientificreports www.nature.com/scientificreports/ ECMWF/CMA-DAM as shown in Fig. 2(c), implying that R s of MERRA-2 is more changeable comparing to the ECMWF. The frequency distribution diagrams between the reanalysis and surface measurements are shown in Fig. 2(a-d). From June to October, the distribution frequency of ERA-Interim is higher with a smaller RB (−5 to 10%), while from December to April, the distribution frequency is higher with a larger RB (15 to 35%). Although the seasonal difference is not obvious in the MERRA-2, it can also be seen that the RB is smaller (10 to 25%) and the frequency is higher in June to October, but larger (25 to 35%) from December to April. This results are consistent with studies of Zhang et al. 17 and Jia et al. 58 . Both ERA-Interim and MERRA-2 have lager frequency distribution with smaller RB in summer than winter and the difference between summer and winter is more prominent than other seasons (Fig. 2). The seasonal differences in the reanalysis products may be due to the   www.nature.com/scientificreports www.nature.com/scientificreports/ seasonal differences in atmospheric factors mentioned in Introduction Section, which will be further analyzed in section 3.3 and 3.4.
Bias and error assessments. The reanalysis monthly mean products are divided into the summer and winter seasons to assess the seasonal dependency of their accuracy. Figure 9 shows the average R s distribution in the different seasons of ERA-Interim and MERRA-2 from 2000 to 2009. The ranges of the radiation in summer are 100.72-309.06 W/m 2 and 186.10-359.12 W/m 2 for ERA-Interim and MERRA-2, respectively. The ranges in winter are 78.45-233.97 W/m 2 and 56.52-222.53 W/m 2 , respectively. The largest R s in summer is located in Qinghai, Tibet and Xinjiang. The smallest R s in winter is located in Sichuan and Guizhou. The R s of the MERRA-2 in summer is higher than that of the ERA-Interim in summer, while it is lower than that of the ERA-Interim in the north and Sichuan Basin in winter. The ERA-Interim and MERRA-2 basically show the same distribution characteristics of R s in China, but there are obvious differences in the R s values.
To investigate the error distribution of the reanalysis data at the different stations, the seasonal distribution of average RB between ERA-Interim and surface measurements from 2000 to 2009 is shown in Figs. 10 and 11, respectively. For the ERA-Interim, in summer, the stations with positive RB values are mainly distributed in southeast China, central east China and the Sichuan Basin. The maximum RB of the CMA stations reaches 38%, and the maximum RB of the DAM stations reaches 35%. The stations with negative RB values are mainly distributed in plateaus and in southwest and northeast China. The smallest RB of the CMA stations is −32%, and the smallest RB of the DAM stations is −48%. In contrast, the RB is basically positive in winter, indicating that the ERA-Interim R s data show overestimation. Specifically, the stations with large positive RB values are mainly distributed in the southeast China, central east China and the Sichuan Basin, which is the same as that in summer, and the maximum RB of the CMA stations is larger than 50%, and the maximum RB of the DAM station reaches 45%. In addition, the ERA-Interim shows seasonal differences; for example, the R s in the northeast and southwest regions is underestimated during the summer and overestimated during the winter. The R s in the southeast region, the central east regions and the Sichuan Basin are overestimated, but the overestimation is more pronounced in winter.
For the MERRA-2, in summer, the stations with a positive RB are mainly distributed in southwest China, central east China and the plateaus. The maximum RB of the CMA stations reaches 61%, and the maximum RB of DAM stations reaches 81%. The stations with a negative RB are mainly distributed in southeast China, especially in the coastal areas. The smallest RB of the CMA stations is −12%, and the smallest RB of the DAM stations is −17%. The errors may be affected by cloud coverage and water vapor content in the coastal areas in summer, which will be further analyzed in section 3.3 and 3.4. In winter, the RB is basically positive and more pronounced The data quality of reanalysis R s showed regional and seasonal characteristics. The underestimation of ERA-Interim mainly occurred in the southwestern and northeastern regions. The underestimation of MERRA-2 in summer is mainly in the southern coastal region and in the northeastern region in winter. In general, both reanalysis data overestimate the R s in most regions of China, and the overestimation by MERRA-2 is more obvious. The MERRA-2 and ERA-Interim both have large errors in the Sichuan Basin, possibly because of the reanalysis of atmospheric parameters, especially the influence of water vapor and cloud coverage. The further analysis will be shown in Section 3.3.

Analysis of influence factors. The ECMWF only has MACC (Monitoring Atmospheric Composition and
Climate) AOD reanalysis product, which is different from AOD used in ERA-Interim. The AOD used in the ERA-Interim radiation transfer model is Climatology AOD data, which don't have long-term variation. So, there is no AOD product provided in the ERA-Interim, we don't consider the impact of AOD to the R s of ERA-Interim. Table 3 shows the PD values for each environmental factor in hotspot areas. The table header gives the names of the environmental factors: AOD, cloud coverage and water vapor content. For the ERA-Interim, in summer, the R s error of the reanalysis product is mainly influenced by cloud coverage in hotspot 1 with the PD values of 0.4241. Although the influence of the water vapor content is small, the impact on the R s could not be ignored since the PD value is also greater than 0.1000. Hotspot 2 is mainly affected by cloud coverage, and the PD value is as high as 0.2410, and the influence of water vapor content can be ignored as PD value (0.0912) is less than 0.1000. In winter, the dominant factors in hotspots 3 and 4 are cloud coverage, and the PD value of these factors is 0.3057, 0.2125, respectively. The water vapor content in hotspots 3 and 4 also passed the significance test, but the PD value is small. For the MERRA-2, in summer, the influence factors of hotspot 5 are ordered as follows: AOD > cloud coverage > water vapor content, and the PD value is 0.2452, 0.1342 and 0.1244 respectively. The influence factors of hotspot 6 are ordered as follows: AOD > cloud coverage > water vapor content, and the PD value is 0.2124, 0.1623 and 0.1398, respectively. In winter, Hotspots 7 and 8 are affected by AOD, cloud coverage and water vapor content, but they are mainly influenced by AOD in hotspot 7 and cloud coverage in hotspot 8. In summary, cloud coverage and AOD are the main influencing factors for the two reanalysis R s products, and with the increase in cloud coverage, the influence of the water vapor content cannot be ignored.
Atmospheric parameter verification. Section 3.3 explained the PD value of the atmospheric influence factors in each hotspot. The section determines whether the influence factors are representative and accurate. As  (Table 3) and the deviation of the radiation (Fig. 3) in the hotspots are combined to verify whether the change in the atmospheric parameters of the reanalysis is consistent with the dominant atmospheric factors in the hotspots.
For the ERA-Interim, the R s is underestimated in the hotspot 1 (Fig. 3a), and the order of influence power of atmospheric factors is cloud coverage > water vapor ( Table 3). The underestimation of the water vapor only can compensate part of the underestimation of the R s due to overestimation of the cloud coverage. Because cloud coverage is the dominant factor of this region (Fig. 12a), whose power is much larger than that of water vapor ( Table 3). The dominant factor in hotspot 2 is the cloud coverage (Table 3), and the underestimation of cloud coverage (Fig. 12a) leads to the overestimation of the R s in the hotspot 2 (Fig. 3a). It is difficult to obtain an accurate R s in the southeast coastal areas (hotspot 3), because of the rapid dynamic change of the cloud cover. The results show that the dominant factors in hotspot 3 is cloud coverage (Table 3), which is underestimated (Fig. 13a), resulting in an overestimation of the ERA-Interim R s in this region (Fig. 3b). In hotspot 4, the influence of cloud coverage is larger than that of other factors ( Table 3). The cloud coverage is overestimated (Fig. 13a), which led to the underestimation of the R s (Fig. 3b). Boilley and Wald 30 also proved that ERA-Interim often mistakes cloudy conditions as clear skies. The opposite is also true though less pronounced: actual clear sky conditions are predicted as cloudy. www.nature.com/scientificreports www.nature.com/scientificreports/ For the MERRA-2, the R s is slightly underestimated in the southeast coastal areas (hotspot 5), and the order of influence power of atmospheric factors is AOD > cloud coverage > water vapor ( Table 3). The underestimation of the AOD cannot compensate the underestimation of R s due to the overestimation of cloud coverage and water vapor content (Fig. 14a-c), because the sum of power of cloud coverage and water vapor (PD = 0.2586) is greater than AOD (PD = 0.2452). The R s in the Sichuan Basin (hotspot 6) is affected by the topography of the basin, and the estimation of the R s is complicated. The influence power of AOD and cloud coverage is greater than water vapor content (Table 3). Therefore, the overestimation of water vapor content cannot compensate the overestimation of R s due to the underestimation of AOD and cloud coverage (Fig. 14a,c). Therefore, the R s in this region is overestimated (Fig. 3c). The dominant factors' order in hotspot 7 is as follows: AOD > cloud coverage > water vapor content (Fig. 15a-c), and all the influence factors are obviously underestimated in this area, which led to the overestimation of the R s in hotspot 7, as shown in Fig. 3d. The cloud coverage and AOD in hotspot 8 are overestimated (Fig. 15a,b), resulting in an underestimation of the R s in this area. The results consistent with Feng and Wang's 60 study that MERRA-2 have a high mean bias over China due to their incorrect estimation of cloud fraction, which is greater in southern China. The bias in trend tend to reduce due to its underestimation of aerosol assimilation. However, MERRA-2 show a positive bias in trend of R s likely due to aerosol-cloud interaction.
The study shows that the R s error of the MERRA-2 and ERA-Interim in the southeast coastal areas are mainly influenced by the cloud coverage, and when the PD value of cloud coverage is great, the influence of water vapor

conclusions
This study presents the validation and inter-comparison of the reanalysis R s estimation from 2000-2009 provided by the ERA-Interim and MERRA-2 using quality-controlled surface measurements at 96 stations from CMA and 716 R s data from the DAM. The reanalysis products are also compared with the satellite retrievals of CERES-EBAF from different perspectives, including accuracy, spatial distribution, periodic variation, seasonal variation, inter-annual variation and regional variation. In addition, considering the influence of atmospheric factors on the earth's R s and the spatially stratified heterogeneity of the atmospheric distribution, the geographical detector is used to find out the errors causes in different hotspots in China.
Overall, the bias between the ERA-Interim products and surface measurements at all stations range from 15.79 to 18.28 W/m 2 , and the RMSE ranges from 28.43 to 32.36 W/m 2 . The bias between the MERRA-2 products and surface measurements at all stations range from 35.68 to 43.84 W/m 2 , and the RMSE ranges from 44.96 to 45.76 W/m 2 . Both the ERA-Interim and MERRA-2 reanalysis radiation data overestimate R s in China, and the overestimation of MERRA-2 is more pronounced. The ERA-Interim has strong seasonal differences, and data in summer and autumn are better than those in spring and winter. In contrast, MERRA-2's seasonal differences are not obvious. The study shows that R s error of MERRA-2 and ERA-Interim in the southeast coastal areas are mainly influenced by cloud coverage. When the PD value of cloud coverage is large, the influence of water vapor content also becomes greater. The error in the Sichuan Basin is mainly affected by AOD and water vapor content, and the error in the northeast and middle east of China is mainly affected by cloud coverage and AOD.
Generally, further improvements and efforts are required for higher accuracy by using more surface measurements, and improvement of the accuracy in the atmospheric parameters would make the verification results more convincing. The results are useful for the proper application and accurate data correction about the two representative global reanalysis data. Further study will focus on establishing the empirical relationship between the factors' PD values and the radiative errors of reanalysis, as well as correct the radiative errors.