Introduction

The solar radiation (Rs) received from the Earth’s surface is one of the most important factors affecting the thermal balance of the atmospheric-Earth system. The Rs precise measurement or estimation has been required for accurate design and management in irrigation and water resource planning and management, agriculture, meteorology, climatology, energy engineering, solar energy systems, and especially in hydrology1,2. One significant part of the hydrological cycle is the evapotranspiration (ET) process that is widely used for agricultural, irrigation management, and water resources planning3. The Rs is the primary input variable in the calculation of ET4. Concerning to the cost and the maintenance and calibration requirements of the Rs estimating instrument, missing data, or due to instrument failure or other related problems, it might be that the estimates of Rs are not available in several regions5. For this reason, several methods have been presented to estimate Rs based on different types of methods such as satellite remote sensing6,7, machine learning8,9,10, numerical, and artificial intelligence11,12. Guermoui et al.13 used two Support Vector Machine (SVM) models for estimating global solar radiation in Algeria. There are some complexes and difficulties in using these methods for Rs estimation such as: requiring many input variables, large datasets, coarse spatial resolution, and the final model may not apply to other areas. Besides, there is no satellite-based database to cover the study areas14.

Another kind of method that has been developed and widely used for estimating Rs are empirical models15. These models based on meteorological variables are a substitute to estimate Rs. Besides, these models using the easily accessible meteorological variables, such as sunshine duration, maximum and minimum air temperatures (Tmax, Tmin), cloudiness, relative humidity (RH), and precipitation, are attractive for their plainness, efficiency, and lower data requirement16. More previous research has determined that the sunshine-based models consistently outperform other types of models17,18. These models do not require many input variables, but their coefficients should be calibrating based on region and input data. However, the requirements to calibrate empirical models demonstrate that their coefficients are changing with locations. The station-dependent coefficients limit the regional application of the empirical models, which is a big challenge for spatial rasterization. The model coefficients for the regional usage must calibrate in order to solve this problem.

Many models have developed for estimating Rs. One of the most famous empirical sunshine-based models is the Angstrom–Prescott (A–P) model. The A–P model has applied to estimate global solar radiation based on measured sunshine hours. This model is widely used for its simpleness and remarkable performance19,20. One of the original constraints of the A–P model is that it requires calibration using local estimated Rs data. Where no measured values for global solar radiation are available in some stations, Angstrom prospered values of 0.2, 0.5, and Prescott 0.22, and 0.54 for the empirical coefficients ‘a’ and ‘b’, respectively21. Given its simpleness and premiere performance compared with other empirical models, its reference values for radiation coefficients ‘a’ and ‘b’, given by the Food and Agriculture Organization (FAO) Irrigation and Drainage Paper No. 56 (FAO56: a = 0.25, b = 0.5), can be used in cases where Rs data are not available16,22. FAO56 proposed the A–P model, which is a simple method to estimate the daily global solar radiation. The results of previous research showed that the application of the FAO pre-defined the A–P coefficients, for a variety of climatic and geographical conditions (regardless of climate effect) could challenge the validity of the FAO56-PM method23. Therefore, many researchers performed a temporal and spatial calibration of ‘a’ and ‘b’24. On the other hand, researchers have attempted to estimate Rs in addition to the sunshine, take advantage of other variables such as air temperature, relative humidity, cloudiness, saturation vapor pressure, and even precipitation.

Recently many kinds of meta-heuristic algorithms have used to calibrate a different type empirical model in the real problem. Few usages of metaheuristic methods to solve solar energy problems have reported; the Genetic Algorithm (GA) is one of these methods. Sen et al.25 have used GA for the designation of the A–P model coefficients.

Harmony Search (HS) is one of the well-known and influential optimization algorithms26, which emulates the music extemporization process where musicians extemporize their instruments’ pitches searching for a perfect state of harmony, was developed by Geem et al.27. The HS algorithm has been recently applied to different engineering optimization problems including optimized design of water dispensation network28, optimal performance of a multi-reservoir system for hydropower and irrigation29, simulation of irrigation systems30, an optimization model for groundwater management objectives31, and recognition of unknown groundwater pollution sources32. To fix the defects of the HS algorithm, the methods such as the Global Harmony Search (GHS) and Improved Harmony Search (IHS) algorithm developed. Another optimization algorithm used for effective global minimization and calibration of hydrologic models is the Shuffled Complex Evolution (SCE) algorithm33. In addition, this algorithm has been used widely for the calibration of different rainfall-runoff models34,35, for the rehabilitation of water distribution networks36, and optimizing urban water supply Headwork systems37.

There has not been much research on computing Rs by optimization algorithms in Iran, and only one research conducted in Mashhad26 examined. This is the first research by optimization algorithms to calibrate the A–P model coefficients in Iran. Through these algorithms, the A–P model coefficients have calibrated faster and more accurately, and Rs is a fundamental input for calculating ET38, have estimated more correctly. Accurate estimation of Rs provides an accurate calculation of ET. The exact calculation of ET is necessary for many applications, such as improving water usage, agricultural planning, and effective water resources management, especially in arid and semi-arid climates.

This research aims to calibrate and improve the A–P model for estimating Rs at six meteorological stations in arid and semi-arid climates of Iran using optimization algorithms including HS, IHS, GHS, and SCE. Then to investigate the effect of T and RH variables on the efficiency of the A–P model to estimate Rs, three improved A–P models were developed by adding terms of Tmax, Tmin, and mean relative humidity (RHmean) and calibrated using applied optimization algorithms.

Material and methods

Study area

Iran is situated among latitudes of 25°N to 40°N and longitudes of 46°E to 65°E with an area of 1,648,000-km2. Most parts of Iran are arid and semi-arid climates. On the other hand, low irrigation efficiency in agricultural fields requires that the amount of ET and water requirement of plants that require an accurate estimate of Rs has calculated. In this research six meteorological stations, which situated at arid and semi-arid climates of Iran, have selected to evaluate the performance of the calibrated A–P model in Rs estimation. The selected stations have arid and semi-arid climates based on the De Martonne climate classification method39,40 from 1992 to 2017 and reliable long-term data (Fig. 1). The criteria for selecting the meteorological stations have based on the climate sort and the availability of the measured Rs.

Figure 1
figure 1

Location of meteorological stations.

Data and quality control

Daily meteorological data from six radiation stations have obtained from the Islamic Republic of Iran Meteorological Organization (IRIMO). The geographic and meteorological characteristics of the studied stations have presented in Table 1. In this research, the following meteorological characteristics have used as the inputs of the A–P and the three improved models: Tmax, Tmin, RHmean, and Rs (MJ m−2 day−1), maximum possible daily duration of sunshine hours (N), and mean the daily number of sunshine duration (n). Due to the importance of radiation data, the quality control of the observed daily global Rs was carried41:

  • If either the fluency index (Rs/Ra) or relative sunshine hours (n/N) were greater than one, the data for that day were deleted from the dataset.

  • If Rs was greater than 0.78 × Ra, the data for that day have deleted from the dataset.

  • If Rs was lower than 0.03 × Ra, the data for that day have deleted.

  • If there were ten or more days of lost data in the same month, the data for that month has omitted.

Table 1 Geographical and meteorological characteristics for the studied stations.

Models and optimization algorithms

Models

The A–P model has based on sunshine, and to examine the effect of other meteorological variables, the following models presented have examined in Table 2.

Table 2 Improved A–P model based on terms of Tmax, Tmin, and RHmean.

Optimization algorithm

The optimization algorithms have coded with MATLAB R2018a (9.4.0.813654). These algorithms have applied to find the optimal solution to a given calculational problem that minimizes or maximizes a special function. In this research, optimization algorithms including SCE, IHS, GHS, and HS have used.

Shuffled Complex Evolution (SCE) algorithm

The SCE algorithm has expanded at the University of Arizona42. Its strategy combines the strengths of the controlled random search (CRS) algorithms with the concept of competitive evolution43 and the newly modified concept of complex shuffling. The most important steps of the SCE have displayed in Algorithm 1.

figure a

Harmony Search (HS) algorithm

When listening to a beautiful piece of classical music, who has ever wondered if there is any connector between music and finding an optimal solution to a tough design problem such as the water distribution networks or other design problems in engineering? For the first time, scientists have found such a fascinating connection by expanding a new algorithm, called HS. Geem et al. first expanded the HS in 2001.

$${\text{HM}} = \left[ {\begin{array}{*{20}l} {{\text{x}}_{{{11}}} } \hfill & {{\text{x}}_{{{12}}} } \hfill & {{\text{x}}_{{{13}}} } \hfill & \cdots \hfill & {{\text{x}}_{{{\text{1n}}}} } \hfill \\ {{\text{x}}_{{{21}}} } \hfill & {{\text{x}}_{{{22}}} } \hfill & {{\text{x}}_{{{23}}} } \hfill & \cdots \hfill & {{\text{x}}_{{{\text{2n}}}} } \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill \\ {{\text{x}}_{{{\text{HMS1}}}} } \hfill & {{\text{x}}_{{{\text{HMS2}}}} } \hfill & {{\text{x}}_{{{\text{HMS3}}}} } \hfill & \cdots \hfill & {{\text{x}}_{{{\text{HMSn}}}} } \hfill \\ \end{array} } \right]$$
(1)

Harmony memory considering (HMC) rule:

  • For this rule, a new random number r1 has produced within the range [0, 1].

  • If r1 < HMCR, where HMCR is the harmony memory consideration rate, then the first decision variable in the new vector xijnew is elected randomly from the values in the present HM as follows:

    $${\text{x}}_{{{\text{ij}}}}^{{{\text{new}}}} {\text{ = x}}_{{{\text{ij}}}} {,}\,\,\,{\text{x}}_{{{\text{ij}}}} \in \left\{ {\begin{array}{*{20}l} {{\text{x}}_{{{\text{1j}}}} {,}} \hfill & {{\text{x}}_{{{\text{2j}}}} {,}} \hfill & {{\text{x}}_{{{\text{3j}}}} {,}} \hfill & { \cdots ,} \hfill & {{\text{x}}_{{{\text{HMSj}}}} } \hfill \\ \end{array} } \right\}$$
    (2)

The most important steps of the HS have displayed in Algorithm 2.

figure b

Developed Harmony Search (HS) algorithm

The HS is good at recognizing high-performance areas of the solution space in a sensible amount of time but it gets difficult to do a local search for numeral usages. To improve the exact situation feature HS algorithm, IHS and GHS use a new method that increases the precision setting and the convergence rate of HS. The IHS usages a new method to generate new solution vectors that increase the precision and convergence rate of the HS. Omran and Mahdavi44 suggested a new variation of HS, called GHS. First, in GHS, a dynamically updating scheme of parameter PAR usage in IHS45 employed to improve the performance of GHS. Second, GHS modifies the pitch adjustment step of HS to use the best harmonic guidance information in harmony memory (HM). In the altered stage, GHS not only destroys the parameter bandwidth (BW), which is difficult to set because it can take any values in the range of [0,\(\infty\)] but also introduces a social term of the best harmony with HS. These two methods (IHS, GHS) have developed to overcome the disadvantages of the original method.

Methodology

One of the most popular empirical sunshine-based models is the A–P model. This model has used to estimate global solar radiation based on measured sunshine hours. The model is as follows46,47:

$${\text{R}}_{{\text{s}}} = {\text{R}}_{{\text{a}}} \left[ {{\text{a}} + {\text{b}}\left( \frac{n}{N} \right)} \right]$$
(3)

Here Rs and Ra is daily global solar radiation and daily extraterrestrial solar radiation (MJ m−2 day−1), respectively Ra , n is the mean daily number of sunshine duration (h), N is the maximum possible daily duration of sunshine hours (h) and ‘a’ and ‘b’ are empirical coefficients which must be calibrated based on long-term measured Rs data. Ra data for each day and location have gained from the estimation of geographical parameters including solar declination, solar constant, and the time of the year as shown in the method below48:

$${\text{R}}_{{\text{a}}} = {37}.{\text{6d}}_{{\text{r}}} \left[ {\upomega _{{\text{s}}} \sin \emptyset {\text{sin}}\updelta + {\text{cos}}\emptyset {\text{cos}}\updelta {\text{sin}}\upomega _{{\text{s}}} } \right]$$
(4)

Here dr is the eccentricity correction factor of the Earth’s orbit (Eq. 5); ωs is the sunshine hour angle of the sun at sunrise in radians (Eq. 6), ϕ is the latitude of the station, and δ is the solar declination angle in radians Eq. (7):

$${\text{d}}_{{\text{r}}} = 1 + 0.033\cos \left( {{\text{J}}_{{\text{s}}} \frac{360}{{365}}} \right)$$
(5)
$$\upomega _{{\text{s}}} = \arccos ( - \tan \emptyset \tan\updelta )$$
(6)
$$\updelta = 0.409\sin \left( {\frac{360}{{365}}{\text{J}}_{{\text{s}}} - 1.39} \right)$$
(7)

The maximum possible average daily length of sunshine hour N can calculate by Duffie–Beckman 1991 model:

$${\text{N}} = \frac{2}{15}\upomega _{{\text{s}}}$$
(8)

Performance indicators

The performance indicators discussed in this research were the coefficient of determination (R2), Mean Bias Error {MBE (MJ m−2 day−1)}, Root Mean Square Error {RMSE (MJ m−2 day−1)}. These indicators calculated as follows:

$${\text{R}}^{2} = \left[ {\frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \left( {{\text{R}}_{{{\text{estim}}}} -\upmu _{{{\text{estim}}}} } \right)\left( {{\text{R}}_{{{\text{meas}}}} -\upmu _{{{\text{meas}}}} } \right)}}{{\left[ {\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \left( {{\text{R}}_{{{\text{estim}}}} -\upmu _{{{\text{estim}}}} } \right)^{2} } \right]^{0.5} \left[ {\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{m}}} \left( {{\text{R}}_{{{\text{meas}}}} -\upmu _{{{\text{meas}}}} } \right)^{2} } \right]^{0.5} }}} \right]^{2}$$
(9)
$${\text{RMSE}} = \left[ {\frac{{1}}{{\text{M}}}\sum\nolimits_{{\text{i = 1}}}^{{\text{M}}} {({\text{R}}_{{{\text{estim}}}} - {\text{R}}_{{{\text{meas}}}} )^{2} } } \right]^{1/2}$$
(10)
$${\text{MBE}} = \frac{{1}}{{\text{M}}}\sum\nolimits_{{\text{i = 1}}}^{{\text{M}}} {({\text{R}}_{{{\text{estim}}}} - {\text{R}}_{{{\text{meas}}}} )}$$
(11)

Here M is the total number of estimated values, Restim and Rmeas are, estimated and measured daily global solar radiation values respectively, μestim is the average of the daily estimated values and μmeas is the average of the daily measured values. The R2 stands for the proportion of variability in a data set that has calculated by the model. The MBE, RMSE, and the R2 statistical indices have used to evaluate the performance of applied optimization methods and improved the A–P model for Rs estimating. The negative values of MBE represent the difference between the estimated data and measured data. If the MBE value is positive, then the estimated values are overestimated and if the MBE value is negative, it means underestimating the estimated values. Whatever the MBE value is closer to zero indicates the accuracy of the model and the closeness of the amount of estimation data to the measured data.

Results and discussion

The calibrated coefficients for the A–P model and the models obtained with different optimization algorithms, the empirical coefficients (a, b, c, d) for four models, and the RMSE, R2, MBE values are shown in Tables 3 and 5 respectively.

Table 3 The locally calibrated of the models coefficients for the selected stations using optimization algorithms.

The statistics of the calibrated A–P coefficients in six meteorological stations (Table 3) showed that the coefficient ‘a’ had low values in Esfahan in the HS algorithm and high values in Bandar Abbas in the IHS algorithm. The coefficients ‘a’ and ‘b’ predicted by four models and by four optimization algorithms. Adding Tmax, Tmin, and RHmean terms to the A–P model have had little effect on improving the radiation estimation used by the models. Zero or near-zero values of Tmax, Tmin, and RHmean coefficients indicate this.

Statistical analysis (kurtosis, Skewness) on data shown that Table 4. In this table, Skewness essentially measures the symmetry of the distribution, while Kurtosis determines the heaviness of the distribution tails. In positively Skewness, the mean of the data is greater than the median.

Table 4 Statistical analysis (Kurtosis, Slowness) on data.

In negatively Skewness, the mean of the data is less than the median. Negatively Skewness distribution is a type of distribution where the mean, median, and mode of the distribution are negative rather than positive or zero. Kurtosis is a statistical measure, whether the data is heavy-tailed or light-tailed in a normal distribution. Kurtosis less than 3 having a lower tail and stretched around center tails means most of the data points are present in high proximity with mean. A Kurtosis less than 3 distribution is flatter (less peaked) when compared with the normal distribution.

Evaluation of solar radiation (Rs) estimation models

In the studied stations, the values of R2, RMSE, and MBE for the calibrated models showed in Table 5. When tested using the R2 value, the calibrated models found to execute best in Mashhad, followed by Esfahan, Shiraz, Yazd, Kerman, and Bandar Abbas. Due to the inaccuracy in recording and many discarded data in the Bandar Abbas station, this station did not have very good results compared to other stations. The RMSE performance indicated that the calibrated models had the smallest error in Mashhad, followed by Esfahan, Bandar Abbas, Kerman, Shiraz, and Yazd. The mean RMSE values for the three improved models were lower than 1.3, which also indicated acceptable exactitude. The mean R2 value of the improved models was largest in Mashhad (0.977), followed by the values for Esfahan, Shiraz, Yazd, Kerman, and Bandar Abbas. The performance of the improved models in the same climates showed very small variation. The RMSE statistic showed that all models were more accurate in Esfahan, with an average value of 0.89 MJ m−2 day−1, followed by Bandar Abbas, Mashhad, Shiraz, Kerman, and Yazd. All improved models validated by the two statistical indicators performed well and that there was no significant difference between the models in each station and it shows that these two indicators could not be used alone to specify the best model in each station. Therefore, the MBE statistic used to determine the difference between the estimated data and measured data. Based on Performance indicators RMSE, MBE, calibration of the A–P model improved the accuracy of estimated Rs in most of the studied stations. If the value of R2 and RMSE are closer to one and zero respectively, the model is more appropriate.

Table 5 Statistical comparison of calibration (Ca) and validation (Va) estimated Rs (using the locally calibrated of the models coefficients).

Comparison of results with other researchers

Calibrated the coefficients of the A–P model by various researchers shown in Table 6. In this research, the coefficients ‘a’ and ‘b’ calculated for the selected stations with different optimization algorithms (Table 3). Coefficient ‘a’ varies from 0.13 to 0.39, Also coefficient ‘b’ varies from 0.33 to 0.67 for six stations.

Table 6 Comparison of calibrated coefficients of the A–P model in the present study with the results of other researchers.

In comparison with previous research, some differences observed between the results of this research and other works. For example, Sabziparvar et al.49, and Khalili and Rezaei Sadr50 applied the A–P model for Shiraz and reported the following pairs of ‘a’ and ‘b’, 0.247, 0.512; 0.29, 0.42, respectively While in the present research values of ‘a’ and ‘b’ coefficients are obtained as 0.25 and 0.53 with the SCE optimization algorithm for the same station; that is in good agreement with the coefficients of Sabziparvar et al. In this research, the A–P coefficients ‘a’ and ‘b’ with the SCE optimization algorithm are obtained 0.22 and 0.62 for Mashhad, but Khalili and Rezaei Sadr50, and Sabziparvar et al.49 reported, 0.30, 0.37 and 0.274, 0.418 for the same station, respectively. Sabziparvar et al.49, and Khalili and Rezaei Sadr50 suggested the application of the A–P model for the Esfahan station with the following pairs of coefficients ‘a’ and ‘b’: 0.271, 0.482; and 0.30, 0.42; but this research suggests values of 0.15 and 0.58 for ‘a’ and ‘b’ with the SCE optimization algorithm, respectively (Table 3). The inconsistent of the results can explained by a longer period of estimated Rs, which applied in this research. Based on Liu et al.23, sample size and the length of the observation period could illustrate such differences in different researches. In addition, the rules for quality control of the Rs dataset and the higher restrictions for removing unreliable Rs data might somewhat cause such discrepancies (Table 6).

The values of measured and estimated global solar radiation are compared by the A–P model from 1992 to 2017 as shown in Fig. 2. To appraise the prediction accuracy of Rs, computed from the regional best performing estimated data and the measured data, specific values of the A–P model statistics by different optimization algorithms (HS, IHS, GHS, and SCE) compared in the Kerman station. In addition, the R2 values of both the measured data and the estimated data in this station were very close to the 1:1 line, which means that the Rs determined from the estimated data and measured data were in good accordance.

Figure 2
figure 2

Comparison of measured and estimated Rs in the A–P model.

According to Table 5 and Fig. 2, the calibration and validation performance of the A–P model were better than the three improved models in all stations. As shown in Table 5, the RMSE varies between 0.82 and 2.67 MJ m−2 day−1 for the A–P model with the SCE algorithm in the calibration phase. Besides, other indicators were lower in the case of the A–P models in the SCE algorithm. Based on the results in Tables 5 and 6, the decrease rate of RMSE values in various stations for four optimization algorithms was different. For example, in the SCE algorithm, the value of RMSE decreased by about 4% and 7% for Mashhad and Kerman stations in the calibration phase contrasted to the HS algorithm, respectively. In other words, the highest decrease of RMSE related to the Kerman station. The lowest value of R2 is observed in the Bandar Abbas station (R2 = 0.81). Further, according to MBE values, a decrease occurred in the MBE of all stations in the SCE algorithm contrasted to three algorithms (IHS, GHS, and HS), in the A–P and three improved models.

The values of R2 and RMSE for Mashhad and Kerman stations by different optimization algorithms, the A–P model, and the three improved models is shown in Fig. 3.

Figure 3
figure 3

Comparison R2 and RMSE between the calibrated and validation model with different optimization algorithms for Mashhad and Kerman stations.

The values of ‘a’ and ‘b’ in the harmonic memory sizes (HMS) (5, 10, 20, 30, and 40) are shown in six meteorological stations in Fig. 4. This Figure shows that as the initial population increases, the values of the coefficients become convergent and a smaller range for the coefficients obtain in different stations. For example, in the Kerman station, with increasing HMS, the minimum and maximum coefficient ‘a’, changes from 0.18 to 0.35 and from 0.39 to 0.36, respectively. The maximum and minimum values of ‘a’ are close to each other, which is true for coefficient ‘b’.

Figure 4
figure 4

The minimum and maximum A–P model coefficients in Hs method, in different harmony memory size (HMS).

Conclusion

In this article, Harmony Search (HS), Global Harmony Search (GHS), Improved Harmony Search (IHS), and Shuffled Complex Evolution (SCE) optimization algorithms were used to calibrate the coefficients of the Rs model and its three improved models (on the six meteorological stations in Iran from 1992 to 2017). For practical usage, using a calibrated form of the A–P model seems necessary for Iran’s climatic situations.

Coefficients of models in which the T and RH used calibrate by optimization methods. The results showed that adding Tmax, Tmin, and RHmean did not affect the A–P model. In addition, the SCE optimization algorithm method has shown better results than other optimization methods. Table 7 presents the final models for the studied stations.

Table 7 Original equations obtained in this research for the estimation of solar radiation by SCE algorithm.

Considering the sunshine, which is an important factor for estimating Rs, and accepting that Iran is a country in which sunshine is significant, the Angstrom empirical model can well estimate total radiation. The coefficients ‘a’ and ‘b’ have calibrated in this research. Coefficient ‘a’ varies from 0.1 to 0.47 and coefficient ‘b’ varies from 0.2 to 0.69 for studied stations.

In this research, the three Rs estimation models have appraised and calibrated. The results indicate that the A–P model (R2 = 0.981 in Mashhad station) offers the best Rs estimations in the semi-arid and arid climate among the improved models, as compared to the measured Rs.