Calibration and validation of the Angstrom–Prescott model in solar radiation estimation using optimization algorithms

The Angstrom–Prescott (A–P) model is widely suggested for estimating solar radiation (Rs) in areas without measured or deficiency of data. The aim of this research was calibration and validation of the coefficients of the A–P model at six meteorological stations across arid and semi-arid regions of Iran. This model has improved by adding the air temperature and relative humidity terms. Besides, the coefficients of the A–P model and improved models have calibrated using some optimization algorithms including Harmony Search (HS) and Shuffled Complex Evolution (SCE). Performance indices, i.e., Root Mean Square Error (RMSE), Mean Bias Error, and coefficient of determination (R2) have used to analyze the models ability in estimating Rs. The results indicated that the performance of the A–P model had more precision and less error than improved models in all the stations. In addition, the best results have obtained for the A–P model with the SCE algorithm. The RMSE varies between 0.82 and 2.67 MJ m−2 day−1 for the A–P model with the SCE algorithm in the calibration phase. In the SCE algorithm, the values of RMSE had decreased about 4% and 7% for Mashhad and Kerman stations in the calibration phase compared to the HS algorithm, respectively.


Material and methods
Study area. Iran is situated among latitudes of 25°N to 40°N and longitudes of 46°E to 65°E with an area of 1,648,000-km 2 . Most parts of Iran are arid and semi-arid climates. On the other hand, low irrigation efficiency in agricultural fields requires that the amount of ET and water requirement of plants that require an accurate estimate of R s has calculated. In this research six meteorological stations, which situated at arid and semi-arid climates of Iran, have selected to evaluate the performance of the calibrated A-P model in R s estimation. The selected stations have arid and semi-arid climates based on the De Martonne climate classification method 39,40 from 1992 to 2017 and reliable long-term data (Fig. 1). The criteria for selecting the meteorological stations have based on the climate sort and the availability of the measured R s . Data and quality control. Daily meteorological data from six radiation stations have obtained from the Islamic Republic of Iran Meteorological Organization (IRIMO). The geographic and meteorological characteristics of the studied stations have presented in Table 1. In this research, the following meteorological characteristics have used as the inputs of the A-P and the three improved models: T max , T min , RH mean , and R s (MJ m −2 day −1 ), maximum possible daily duration of sunshine hours (N), and mean the daily number of sunshine duration (n). Due to the importance of radiation data, the quality control of the observed daily global R s was carried 41 : • If either the fluency index (R s /R a ) or relative sunshine hours (n/N) were greater than one, the data for that day were deleted from the dataset. • If R s was greater than 0.78 × R a , the data for that day have deleted from the dataset.
• If R s was lower than 0.03 × R a , the data for that day have deleted.
• If there were ten or more days of lost data in the same month, the data for that month has omitted. www.nature.com/scientificreports/  www.nature.com/scientificreports/ Models and optimization algorithms. Models. The A-P model has based on sunshine, and to examine the effect of other meteorological variables, the following models presented have examined in Table 2.
These algorithms have applied to find the optimal solution to a given calculational problem that minimizes or maximizes a special function. In this research, optimization algorithms including SCE, IHS, GHS, and HS have used.
Shuffled Complex Evolution (SCE) algorithm. The SCE algorithm has expanded at the University of Arizona 42 . Its strategy combines the strengths of the controlled random search (CRS) algorithms with the concept of competitive evolution 43
x' i = x i ± r × bw, where r ~ U (0, 1) and bw is an arbitrary distance bandwidth.  45 employed to improve the performance of GHS. Second, GHS modifies the pitch adjustment step of HS to use the best harmonic guidance information in harmony memory (HM). In the altered stage, GHS not only destroys the parameter bandwidth (BW), which is difficult to set because it can take any values in the range of [0,∞ ] but also introduces a social term of the best harmony with HS. These two methods (IHS, GHS) have developed to overcome the disadvantages of the original method.

Methodology.
One of the most popular empirical sunshine-based models is the A-P model. This model has used to estimate global solar radiation based on measured sunshine hours. The model is as follows 46,47 : Here R s and R a is daily global solar radiation and daily extraterrestrial solar radiation (MJ m −2 day −1 ), respectively R a , n is the mean daily number of sunshine duration (h), N is the maximum possible daily duration of sunshine hours (h) and 'a' and 'b' are empirical coefficients which must be calibrated based on long-term measured R s data. R a data for each day and location have gained from the estimation of geographical parameters including solar declination, solar constant, and the time of the year as shown in the method below 48 : 11 x 12 x 13 · · · x 1n x 21 x 22 www.nature.com/scientificreports/ Here d r is the eccentricity correction factor of the Earth's orbit (Eq. 5); ω s is the sunshine hour angle of the sun at sunrise in radians (Eq. 6), ϕ is the latitude of the station, and δ is the solar declination angle in radians Eq. (7): The maximum possible average daily length of sunshine hour N can calculate by Duffie-Beckman 1991 model: Performance indicators. The performance indicators discussed in this research were the coefficient of These indicators calculated as follows: Here M is the total number of estimated values, R estim and R meas are, estimated and measured daily global solar radiation values respectively, μ estim is the average of the daily estimated values and μ meas is the average of the daily measured values. The R 2 stands for the proportion of variability in a data set that has calculated by the model. The MBE, RMSE, and the R 2 statistical indices have used to evaluate the performance of applied optimization methods and improved the A-P model for R s estimating. The negative values of MBE represent the difference between the estimated data and measured data. If the MBE value is positive, then the estimated values are overestimated and if the MBE value is negative, it means underestimating the estimated values. Whatever the MBE value is closer to zero indicates the accuracy of the model and the closeness of the amount of estimation data to the measured data.

Results and discussion
The calibrated coefficients for the A-P model and the models obtained with different optimization algorithms, the empirical coefficients (a, b, c, d) for four models, and the RMSE, R 2 , MBE values are shown in Tables 3 and  5 respectively. The statistics of the calibrated A-P coefficients in six meteorological stations (Table 3) showed that the coefficient 'a' had low values in Esfahan in the HS algorithm and high values in Bandar Abbas in the IHS algorithm. The coefficients 'a' and 'b' predicted by four models and by four optimization algorithms. Adding T max , T min , and RH mean terms to the A-P model have had little effect on improving the radiation estimation used by the models. Zero or near-zero values of T max , T min , and RH mean coefficients indicate this.
Statistical analysis (kurtosis, Skewness) on data shown that Table 4. In this table, Skewness essentially measures the symmetry of the distribution, while Kurtosis determines the heaviness of the distribution tails. In positively Skewness, the mean of the data is greater than the median.
In negatively Skewness, the mean of the data is less than the median. Negatively Skewness distribution is a type of distribution where the mean, median, and mode of the distribution are negative rather than positive or zero. Kurtosis is a statistical measure, whether the data is heavy-tailed or light-tailed in a normal distribution. Kurtosis less than 3 having a lower tail and stretched around center tails means most of the data points are present in high proximity with mean. A Kurtosis less than 3 distribution is flatter (less peaked) when compared with the normal distribution.
Evaluation of solar radiation (R s ) estimation models. In the studied stations, the values of R 2 , RMSE, and MBE for the calibrated models showed in Table 5. When tested using the R 2 value, the calibrated models found to execute best in Mashhad, followed by Esfahan, Shiraz, Yazd, Kerman, and Bandar Abbas. Due to the inaccuracy in recording and many discarded data in the Bandar Abbas station, this station did not have very good results compared to other stations. The RMSE performance indicated that the calibrated models had the smallest error in Mashhad, followed by Esfahan, Bandar Abbas, Kerman, Shiraz, and Yazd. The mean RMSE values for the three improved models were lower than 1.3, which also indicated acceptable exactitude. The mean  www.nature.com/scientificreports/ R 2 value of the improved models was largest in Mashhad (0.977), followed by the values for Esfahan, Shiraz, Yazd, Kerman, and Bandar Abbas. The performance of the improved models in the same climates showed very small variation. The RMSE statistic showed that all models were more accurate in Esfahan, with an average value of 0.89 MJ m −2 day −1 , followed by Bandar Abbas, Mashhad, Shiraz, Kerman, and Yazd. All improved models validated by the two statistical indicators performed well and that there was no significant difference between the models in each station and it shows that these two indicators could not be used alone to specify the best model in each station. Therefore, the MBE statistic used to determine the difference between the estimated data and measured data. Based on Performance indicators RMSE, MBE, calibration of the A-P model improved the accuracy of estimated R s in most of the studied stations. If the value of R 2 and RMSE are closer to one and zero respectively, the model is more appropriate.
Comparison of results with other researchers. Calibrated the coefficients of the A-P model by various researchers shown in Table 6. In this research, the coefficients 'a' and 'b' calculated for the selected stations with different optimization algorithms (Table 3). Coefficient 'a' varies from 0.13 to 0.39, Also coefficient 'b' varies from 0.33 to 0.67 for six stations.
In comparison with previous research, some differences observed between the results of this research and other works. For example, Sabziparvar et al. 49     www.nature.com/scientificreports/  Table 3). The inconsistent of the results can explained by a longer period of estimated R s , which applied in this research. Based on Liu et al. 23 , sample size and the length of the observation period could illustrate such differences in different researches. In addition, the rules for quality control of the R s dataset and the higher restrictions for removing unreliable R s data might somewhat cause such discrepancies ( Table 6). The values of measured and estimated global solar radiation are compared by the A-P model from 1992 to 2017 as shown in Fig. 2. To appraise the prediction accuracy of R s , computed from the regional best performing estimated data and the measured data, specific values of the A-P model statistics by different optimization algorithms (HS, IHS, GHS, and SCE) compared in the Kerman station. In addition, the R 2 values of both the measured data and the estimated data in this station were very close to the 1:1 line, which means that the R s determined from the estimated data and measured data were in good accordance.
According to Table 5 and Fig. 2, the calibration and validation performance of the A-P model were better than the three improved models in all stations. As shown in Table 5, the RMSE varies between 0.82 and 2.67 MJ m −2 day −1 for the A-P model with the SCE algorithm in the calibration phase. Besides, other indicators were lower in the case of the A-P models in the SCE algorithm. Based on the results in Tables 5 and 6, the decrease rate of RMSE values in various stations for four optimization algorithms was different. For example, in the SCE algorithm, the value of RMSE decreased by about 4% and 7% for Mashhad and Kerman stations in the calibration phase contrasted to the HS algorithm, respectively. In other words, the highest decrease of RMSE related to the Kerman station. The lowest value of R 2 is observed in the Bandar Abbas station (R 2 = 0.81). Further, according to MBE values, a decrease occurred in the MBE of all stations in the SCE algorithm contrasted to three algorithms (IHS, GHS, and HS), in the A-P and three improved models.
The values of R 2 and RMSE for Mashhad and Kerman stations by different optimization algorithms, the A-P model, and the three improved models is shown in Fig. 3.
The values of 'a' and 'b' in the harmonic memory sizes (HMS) (5, 10, 20, 30, and 40) are shown in six meteorological stations in Fig. 4. This Figure shows that as the initial population increases, the values of the coefficients become convergent and a smaller range for the coefficients obtain in different stations. For example, in the Kerman station, with increasing HMS, the minimum and maximum coefficient 'a' , changes from 0.18 to 0.35 and from 0.39 to 0.36, respectively. The maximum and minimum values of 'a' are close to each other, which is true for coefficient 'b' .

Conclusion
In this article, Harmony Search (HS), Global Harmony Search (GHS), Improved Harmony Search (IHS), and Shuffled Complex Evolution (SCE) optimization algorithms were used to calibrate the coefficients of the R s model and its three improved models (on the six meteorological stations in Iran from 1992 to 2017). For practical usage, using a calibrated form of the A-P model seems necessary for Iran's climatic situations.
Coefficients of models in which the T and RH used calibrate by optimization methods. The results showed that adding T max , T min, and RH mean did not affect the A-P model. In addition, the SCE optimization algorithm method has shown better results than other optimization methods. Table 7 presents the final models for the studied stations.
Considering the sunshine, which is an important factor for estimating R s , and accepting that Iran is a country in which sunshine is significant, the Angstrom empirical model can well estimate total radiation. The coefficients 'a' and 'b' have calibrated in this research. Coefficient 'a' varies from 0.1 to 0.47 and coefficient 'b' varies from 0.2 to 0.69 for studied stations.
In this research, the three R s estimation models have appraised and calibrated. The results indicate that the A-P model (R 2 = 0.981 in Mashhad station) offers the best R s estimations in the semi-arid and arid climate among the improved models, as compared to the measured R s . www.nature.com/scientificreports/ www.nature.com/scientificreports/