Introduction

Evaporation is a major constituent of the hydrological cycle and projecting evaporation loss is mainly vital for managing water resources, assessing irrigation scheduling and agricultural modelling1,2,3,4. Evaporation is impacted by the heat energy supply and vapour pressure gradient, which are mostly reliant on meteorological data like air temperature, solar radiation, relative humidity, wind speed, and atmospheric pressure5,6,7. These aspects are also meticulously associated with other aspects such as geographic location, time of day, current season, and kind of climate. Thus, the procedure of evaporation is extremely non-linear and intricate in nature. Overall, there are two key methodologies, i.e., direct and indirect processes, for computing and estimating evaporation. Direct methods like pan evaporation (Epan) are extensively deployed for estimating evaporation. Notably, it is unfeasible to position pan evaporimeters in all places, particularly in inaccessible regions where precise instrumentation cannot be established or sustained8. In an indirect manner, evaporation is projected from empirical equations by utilising other meteorological factors such as maximum and minimum temperature, wind speed, sunshine hours, and relative humidity. Precise gauging of such meteorological factors is a tedious task and entails different sophisticated tools and skilled labour force9. Frequently, instrument fault, inappropriate operation and upkeep, and hostile weather conditions make it tough to gauge these data minus any mistakes, which is vital for projecting evaporation through empirical equations. Any mistake in gauging these factors would cause considerable direct concerns in projecting evaporation.

Hence, indirect techniques of projecting evaporation by utilising empirical equations are a data-sensitive procedure and also guided by different presumptions. Furthermore, the availability of all such meteorological data at a particular weather station is scarce or not easily available and generally discontinuous in certain areas10. Because of the extremely intricate physical and nonlinear form of the evaporation procedure, it is tough to model evaporation by means of empirical techniques11. Furthermore, an empirical model formulated for one agro-climatic scenario might not perform fine in other circumstances and entails recalibration of model coefficients prior to execution12. Earlier, few attempts were made by academics to model the evaporation procedure by formulating many empirical formulae, which are mentioned in the literature13. The selection of optimal model inputs has always been a challenge for the non-linear regression process, and several studies have shown that evaporation is influenced by input weather variables such as air temperature, relative humidity, solar radiation, and wind speed14. Thus, developing accurate empirical models to represent all these complex processes is difficult15.

Literature review

Of late, artificial intelligence (AI) based soft computing methods such as support vector machines (SVM), adaptive neuro-fuzzy inference system (ANFIS), M5 model tree (MT), artificial neural network (ANN), gene expression programming (GEP) and extreme learning machine (ELM) have been effectively deployed for dealing with an extensive gamut of ecological and water engineering issues14,16,17,18,19,20. Artificial intelligence approaches are easier, vigorous and able to deal with the intricate non-linear procedures without difficulty8,21,22. Several studies were recorded about utilising the AI techniques for forecasting diverse hydrological procedures23. They noted that ANN models offer superior estimates as against the traditional techniques. For instance, Castellano-Méndez et al.24 made a comparison of ANN and Box & Jenkins methodologies and deduced that ANN is an upgrade on the Box & Jenkins model regarding the simulation of prospective runoffs with high level of precision.

With regards to evaporation projection and taking into account the drawbacks related to both empirical and measurement methods discussed until now, various studies have also been carried out by utilising machine learning (ML) methodologies with diverse optimization algorithms for estimating pan evaporation25,26. These have offered certain substitute machine learning solutions to the issue with diverse input combinations of existing climatic variables like humidity, temperature, wind speed, solar radiation, sunshine, and vapour pressure27,28. Keskin and Terzi29 employed the ANN and Penman models for modelling evaporation. They utilised many meteorological factors as inputs for the ANNs. The researchers noted that as against the Penman model, ANN is superior when it comes to projecting evaporation. Kişi30 deployed evolutionary neural networks for projecting monthly pan evaporation. The outcomes showed that the recommended models offered superior precision over the empirical techniques. Wang et al.31 studied the ability of generalized regression neural network (GRNN), multilayer perceptron (MLP), least square support vector machine (LSSVM), fuzzy genetic (FG), multivariate adaptive regression spline (MARS) and adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP) for projecting evaporation. They made a comparison of the outcomes with regression methods in various climatic scenarios of China. The researchers noted that heuristic methods usually delivered superior performance compared to empirical and regression methods.

In a research conducted by Deo et al.32, monthly evaporative losses were projected by utilising three machine learning approaches, namely Extreme Learning Machine (ELM), Relevance Vector Machine (RVM), and Multivariate Adaptive Regression Spline. Meteorological factors were utilised as the predictor variable and RVM was observed to be the best approach out of these. Sudheer et al.21 deployed an ANN model for patterning daily evaporation and noted that the ANN model can be utilised effectively for projecting the evaporation rate based on climate data. Falamarzi et al.33 studied the usage of ANN and wavelet ANN for patterning daily evaporation. They utilised wind speed and temperature data as inputs for the models. The outcomes showed that both models precisely projected evaporation. Wang et al.31 projected daily Ep by utilising least square support vector regression (LSSVR), fuzzy genetic (FG), multivariate adaptive regression spline (MARS), M5 model tree (M5Tree) and multiple linear regression (MLR) for eight stations around China’s Dongting Lake basin. Studies indicate that LSSVR and FG offer superior performance compared to other machine learning methods. Monthly EP was projected by Malik et al.34 in the Indian central Himalayas area by deploying MLPNN, radial basis neural network (RBNN), co-active neuro-fuzzy inference system (CANFIS), and self-organizing map neural network (SOMNN). Gamma test was utilised for choosing apt input combination. The researchers noted the dominance of CANFIS over other methodologies. Tezel and Buyukyildiz35 scrutinised the usage of RBFN, MLP, and e-support vector regression (SVR) by utilising diverse training algorithms. Both SVR and ANNs with a scaled conjugate gradient (SCG) learning delivered better performance in comparison to empirical approaches. In Turkey, Kisi et al.36 studied the ability of decision tree-based machine learning methods like Chi-square automatic interaction detector (CHAID) and classification and regression tree (CART) and compared them with the neural network model for day-to-day EP projection. The outcomes indicated that neural networks delivered better performance in comparison to other models in various circumstances.

However, most of these studies chiefly focused on exploring the generalised abilities of ML models in various weathers because every climate has its own features of non-stationarity and stochasticity. ML models like the CART (classification and regression tree), the SVM (support vector machine), the CCNN (cascade correlation neural network), and GEP (gene expression programming) have achieved noteworthy advancements in hydrologic modelling37,38,39,40,41. These models can proficiently simulate and explain the stochasticity of various complex hydro-climatological activities. Recent evaporation forecast studies have revealed a noticeable success through better, more consistent generalised predictive models. It has also been the purpose when forming and implementing novel evaporation prediction techniques, since the target is to attain low prediction errors. It is ascertained from the review of the literature that ANNs with suitable learning algorithms are confirmed to be potentially able to model the evaporation process in different locations and have achieved better results than more complex traditional frameworks42. The task of prediction is nonlinear in nature, and thus the adaptive prediction model should have nonlinear aspects. Nevertheless, the selection and formation of efficient, impressive, and reliable techniques for accurately predicting evaporation remain difficult for the experts as evaporation is complex in nature and a greatly nonlinear process.

In recent times, the American researcher Chen and Guestrin43 presented a new powerful learning model called XGBoost, which is extensively used by data experts and has obtained state-of-the-art outcomes in several areas because of its general scalability. For instance, Lei et al.44 used 6 machine learning approaches to create the prediction models, and the XGBoost model achieved the most precise qualitative predictions. Nonetheless, amongst various techniques known for a long time, one deep learning method of ANN, known as LSTM-NN (Long Short-Term Memory Neural Networks), has drawn attention for time series prediction45. The estimations by this class are affected by the system’s previous behaviour, and it can be employed for both classification and regression purposes. In comparison to other deep models, like the deep Boltzman machine, convolutional neural networks (CNN), and graph-structured recurrent neural network, the LSTM-NN deep learning models perform significantly better. Through a specifically designed architecture, LSTM-NNs have shown better modelling capabilities in different time series applications. In more recent years, Zhang et al.46 related the performance of various neural network models for simulating and forecasting the water levels of an integrated sewer structure in Norway, on the basis of online data from water-level sensors and rain gauges. They proved that LSTM is better adapted for multi-step-ahead estimations than traditional models without explicit cell memory. Zhang et al.46 employed an LSTM model for estimating water tables in agricultural regions. In addition, the authors compared the resultant model from the LSTM-based method with that of a conventional neural network and discovered that the former performs better than the latter.

Objectives

This study’s contribution is to examine the capability and establish the efficiency of the LSTM (Long Short-Term Memory) neural networks in forecasting monthly evaporation (Ep) using information from two climatological stations located in Malaysia: Kota Bharu and Alor Setar. The performance of LSTM was compared with the Extreme Gradient Boosting (Xgboost) as the most reliable ML model and with the ElasticNet Linear Regression (LR). Moreover, the recently developed ML models were compared with two empirical techniques, namely Stephens & Stewart and Thornthwaite. After predicting Ep, the efficiencies of the proposed models were examined and evaluated using various selected performance parameters to assess their effectiveness in the field of evaporation forecasting. Furthermore, the application of the proposed methodology to various regions throughout Malaysia using different data sets will also be investigated in the future work.

Methodology

Study area and data collection

Malaysia is a tropical country with a high rate of rainfall. However, as a result of development, there is an increase in water demand. Furthermore, global climate change lengthens the dry season and increases the rate of evaporation from impounding reservoirs. This research aims to develop a reliable generalised model to predict evaporation throughout Malaysia. The relevance of the models for evaporation prediction was investigated in Kota Bharu and Alor Setar meteorological stations as part of this research. The climate data from two stations, Alor Setar (longitude 100° 24′ E, latitude 6° 12′ N, elevation 3.4 m) as well as Kota Bharu (longitude 102° 18′ E, latitude 6° 10′ N, elevation 4.4 m), managed by the MMD (Malaysian Meteorological Department), are used in this study. Figure 1 displays the location of these stations on the map of Malaysia. Figure 1 has been generated by using Google Map software to identify the location of the study area.

Figure 1
figure 1

Location of case study [Imagery ©2021 TerraMetrics, Map data ©2021 Google].

Weather parameters measured include minimum, maximum and mean air temperature (Tmin, Tmax, Ta), wind speed (Sw), relative humidity (RH), open pan evaporation (Ep) and solar radiation (Rs). The data set comprised 19 years of daily reports from 2000 to 2019. Statistical parameters regarding the climatic data are shown in Table 1 and Fig. 2.

Table 1 Various meteorological variables and their descriptive statistics.
Figure 2
figure 2

Monthly variations of Ep and associated meteorological parameters used in this study.

As far as the climate is concerned, in Alor Setar region, the summers are hot and short; the winters are warm, long, and wet; and it is gloomy and oppressive year-round. During the year, the temperature usually varies from 23 to 34 °C and is seldom less than 22 °C or more than 36 °C. The hot weather lasts for 1.9 months, beginning in February and ending in April, with a mean daily high temperature more than 33 °C. The winter lasts for 6 months, beginning in June and ending in December, with a mean daily high temperature less than 31 °C. The average part of the sky cloaked by clouds experiences noteworthy seasonal variation during the year. In Alor Setar, the sunnier part of the year lasts for 3.6 months beginning in December and ending around March. The clearest day is the mid of January when the sky remains clear, mostly sunny, or partially cloudy for 35% of the time, and mostly cloudy or gloomy for 65% time.

The summer season in Kota Bharu is hot, while the winter season is warm but relatively short; the area witnesses cloudy sky throughout the year. Annual temperatures range between 23 and 32 °C; temperatures are rarely above 34 °C or below 21 °C. The average hot summer duration is 3.2 months, covering April, May, June, and July. On the other hand, November, December, January, and February are relatively cool, with an average duration of 2.6 months; the average high temperature remains less than 29 °C. There is a noteworthy difference in cloud cover over seasons. Kota Bharu skies are clearer for 3.9 months beginning in December and ending in April. The sky is clearest in February, which has the clearest day of the year. Additionally, there is a 36% incidence of clear, partially clear, or partly cloudy sky, while higher cloudiness is present for the remaining 64% of the month.

Selection of input combinations and data partitioning

Input variability can substantially affect the modelling procedure and prediction accuracy; therefore, different combinations of input data were explored in the present study to ascertain optimal ML model predictability for evaporation. There are specific conscious choices for selecting these combinations. First, for comparison purpose, input parameters to the machine learning models (XGR, ElasticNet LR, and LSTM) were chosen based on the needed meteorological aspects in the two proposed empirical models (Stephens–Stewart and Thornthwaite). Second, availability of consistent long-term weather data has always been one of the major constraints in deciding on input combination. In fact, one or two missing meteorological parameters for model inputs can be expected in some case studies, primarily in developing countries. It is advantageous then to investigate the level of prediction accuracy of ML models in the absence of any input parameters. This will also provide a better practical understanding of how each input variable affects the evaporation forecast in that region. Hence, one of the objectives of this study is to explore the predictive capability of ML models under various input combinations of meteorological variables to successfully map the model input–output with a high level of prediction accuracy. Moreover, the present research assesses the effects of input variable Ep. In this context, the input data records were chosen based on how the antecedent records were related to the predicted output value. As shown in Fig. 3, the autocorrelation analysis for the historical monthly time series for the pan evaporation rate revealed that the correlation deteriorated significantly once it passed the preceding second lag-time record. This demonstrates that the preceding second evaporation rate record influenced the evaporation rate at any time. Consequently, the maximum lag time of two antecedent records was used as the model input when developing the proposed model for the monthly time increment. Table 2 depicted the ten combinations of inputs having varying Tmin, Tmax, Ta, Rs, Sw, RH and Ep.

Figure 3
figure 3

Partial Autocorrelation for Alor Setar and Kota Bharu stations (Monthly).

Table 2 Input combinations of meteorological variables used for machine learning models.

The 80/20 principle was used to split the quantified meteorological data into training, and testing sets required for ML modelling; 80% of data were employed for model training, while the remaining 20% were used for testing. The training set was used to initiate ML parameter training. Subsequently, the test set was employed to assess the model. Thus, the dataset was divided by taking the first years for training and the last ones for testing. The present study aims to perform a detailed evaluation for testing soft computing capability and using practical frameworks for predicting monthly evaporation levels at the Kota Bharu and Alor Setar regions.

Empirical models used for monthly pan evaporation prediction

Taking into consideration the number of required meteorological inputs and data availability, two empirical techniques, namely Stephens–Stewart and Thornthwaite, have been selected in this study as the most commonly used methods47.

Stephens and Stewart

This method also called the “Fractional Evaporation-Equivalent of Solar Energy” by Stephens and Stewart48. Stephens and Stewart suggested that using measured radiation Qs, give a better result where data are available and can be correlated with temperature, as shown in Eq. (1):

$$Ep= \left(0.0082Ta-0.19\right)\left(\frac{Qs}{1500}\right) \times 25.4,$$
(1)

where \(Ta\), \(Ep\), and \(Qs\) denotes mean air temperature (F), evaporation (mm), and solar radiation (cal cm−2 day−1). Stephens and Stewart also suggested the desirability of continued studies in other regions to establish such relations under a range of climatic conditions.

Thornthwaite

Thornthwaite49 used practical data to ascertain the correlation between probable evaporation (\(Ep\)) and mean monthly temperature (\(Ta\)) and standardised it to a 30-day month, comprising 12 h of sunshine per day. The Thornthwaite technique is used to compute potential evaporation (\(Ep\)); the following expression is used to compute the Monthly Thornthwaite Heat Index (\(i\)):

$$i = \left( {\frac{{Ta}}{5}} \right)^{{1.514}} ,$$
(2)

where \(Ta\) denote mean monthly temperature (°C).

The Annual heat index \(\left(I\right)\) is the aggregate of Monthly Heat Indices \(\left(i\right)\):

$$I=\sum_{i=1}^{12}i.$$
(3)

A potential Evaporation \(Ep\) estimation is obtained for each month by applying the following equation:

$$Ep=16\times {\left(\frac{10. Ta}{I}\right)}^{a},$$
(4)

where \(a\) is:

$$a=\left(675 \times {10}^{-9} \times {I}^{3}\right)-\left(771 \times {10}^{-7}\times {I}^{3}\right)-\left(1792\times {10}^{-5}\times I\right)+0.49239.$$
(5)

\(Ep\) for a given month is, therefore, given by the expression:

$$Ep={Ep}_{Obtained} \cdot \frac{N}{12}\cdot \frac{d}{30} \; \left(\text{mm}\right).$$
(6)

\(N\) and d denote theoretical monthly sunshine hours, and days in the month, respectively.

Machine learning (ML) models used for monthly pan evaporation prediction

The present study used three ML frameworks for estimating evaporation. These models are Extreme Gradient Boosting (XGB)43, ElasticNet Linear Regression (ElasticNet LR)50, and Long Short-Term Memory (LSTM)45. The training and testing for the machine learning models were carried out by using the TensorFlow framework on an NVIDIA GeForce GTX 1080 Ti GPU.

ElasticNet linear regression

ElasticNet LR is a regularisation method for linear regressions. It is typically employed for addressing linear regression overfitting50,51,52. This method applies the elastic net penalty on the least-squares calculation. The method comprises two widely-used penalty expressions (L1 and L2); these are included with the loss function while the system is trained50,51. The method allows addressing the challenges concerning the lasso method. The ElasticNet technique combines the advantages of the Ridge Regression and Lasso method, thereby creating a trade-off between the constituent methods.

$$\beta^{\prime} = \arg \min_{\beta } \left( {\left\| {y - X\beta } \right\|^{2} + \lambda_{2} \left\| \beta \right\|^{2} + \lambda_{1} \left\| \beta \right\|_{1} } \right),$$
(7)

β′ denote the ideal weights required for minimising the loss function, which comprises the squared difference of the real and predicted values, including the two regularisation items. Penalty expressions for L1 and L2 are λ1 β1, and λ2 β2; here, λ coefficients must be tuned.

Both alpha and l1_ratio was tuned to select the optimal values. Alpha with values [0.001, 0.01, 0.1, 1, 10, 100, 1000] and l1_ratio with values [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1] were evaluated. A constant value, alpha, is used for penalty term multiplication; l1_ratio penalty as an L2 type penalty with zero value; L1 penalty for l1_ratio = 1; in case l1_ratio is between 0 and 1, a combined L1 and L2 penalty is applied.

Extreme gradient boosting (XGB)

XGB is a scalable end-to-end tree-based learning framework with more than ten times the speed of present offerings meant for a single machine. XGB scaling comprises numerous examples for memory-restricted applications43,50. XGB scalability is the consequence of numerous optimisations implemented for addressing the roadblock. This framework employs the gradient descent technique for loss minimisation, while regularisation is used to regulate overfitting.

In our experiment, we try different set of values of hyperparameters to select the optimal one which is given as follows:

  1. 1.

    Least squares regression Loss function to be optimized.

  2. 2.

    Learning rate = 0.5.

  3. 3.

    Number of estimators = 100; gradient boosting is quite robust to over-fitting, and hence a large number typically drives superior performance.

  4. 4.

    The number of features to consider when looking for the best split max_features = ’‘sqrt’ “sqrt”, then max_features = sqrt(n_features).

  5. 5.

    Max_depth = 3; the maximum depth limits the number of nodes in the tree.

Long short-term memory (LSTM)

LSTM belongs to the Recurrent Neural Network (RNN) category and is employed for long-range sequence models. Figure 4 depicts an LSTM memory cell that store state information and is regulated using gates. This system reduces gradient vanishing. An LSTM structure can record temporal associations.

Figure 4
figure 4

LSTM neural network cell.

The present study applies several inputs like humidity, temperature, wind, evaporation, and solar radiation to LSTM. Iterative LSTM parameter tuning was performed for data fitting. Several hyperparameters were evaluated to identify the optimal LSTM architecture that provides ideal assessment metrics.

The hyperparameters include:

  1. (1)

    number of LSTM layers;

  2. (2)

    number of nodes in each layer;

  3. (3)

    number of fully connected layers;

  4. (4)

    types of activation function;

  5. (5)

    number of dropout layers and percentage of dropout;

  6. (6)

    learning rate;

  7. (7)

    loss function;

  8. (8)

    optimizer;

  9. (9)

    batch size;

  10. (10)

    number of epochs.

The ideal LSTM structure for the study comprises these layers:

  1. 1.

    LSTM with 512 nodes.

  2. 2.

    Fully connected layers with 64 nodes and ReLU activation function

  3. 3.

    Dropout with 0.4%.

  4. 4.

    Fully connected layers with 1 node and Linear activation function.

The final hyperparameters are:

  1. 1.

    the learning rate is 0.001;

  2. 2.

    loss function of mean absolute error (MAE);

  3. 3.

    optimizer: ADAM;

  4. 4.

    epochs is 500;

  5. 5.

    the batch size is 8.

In general, AI models are chosen based on the availability of data and their ability to solve the targeted problem. There are various benefits and drawbacks for each model that have been observed across all AI models. Table 3 summarises the advantages and disadvantages of each model. However, since each model behaves separately depending on the problem it was designed to solve, these benefits and drawbacks may not be necessarily lined with the model.

Table 3 Advantages and disadvantages of the proposed machine learning models.

Performance evaluation

Model predictive performance was assessed using several statistical indicators like mean absolute error (MSE), determination coefficient (R2), relative absolute error (RAE), root mean square error (RMSE), relative squared error (RSE) and mean absolute error (MAE). These indicators are described below:

  1. (1)

    The determination coefficient (R2) is an indicator that specifies the correlation between the real and predicted output; it lies between zero and one (both inclusive). Zero value indicates a random model, while one indicates a perfect fit.

    $${R}^{2}= \frac{\sum_{i=1}^{n}\left(y- \overline{y }\right) (\widehat{y}- \overline{\widehat{y} })}{\sqrt{\sum_{i=1}^{n}{(y- \overline{y })}^{2 } } \sum_{i=1}^{n}{(\widehat{y}- \overline{\widehat{y} })}^{2} }.$$
    (8)
  2. (2)

    Mean absolute error (MAE) the absolute error existing between the real and predicted output.

    $$MAE= \frac{1}{n}\sum_{i=1}^{n}\left|y- \widehat{y}\right|.$$
    (9)
  3. (3)

    Mean square error (MSE) the average squared error existing between the predicted and real output.

    $$MSE= \frac{1}{n} {\sum_{i=1}^{n}(y- \widehat{y})^{2}}.$$
    (10)
  4. (4)

    Root mean square error (RMSE) the square root of average squared error existing between the predicted and real output.

    $$RMSE=\sqrt{\frac{{\sum_{i=1}^{n}(y- \widehat{y})}^{2}}{n}}.$$
    (11)
  5. (5)

    Relative absolute error (RAE) The magnitudes of real and predicted outputs are subtracted, and the values are aggregated and normalised.

    $$RAE= \frac{\sum_{i=1}^{n}\left|y- \widehat{y}\right|}{\sum_{i=1}^{n}\left|y- \overline{y }\right|}.$$
    (12)
  6. (6)

    Relative squared error (RSE) It denotes the normalised aggregate of the squared difference between the predicted and real output.

    $$RSE= \frac{\sum_{i=1}^{n}{(y- \widehat{y})}^{2}}{\sum_{i=1}^{n}{(y- \overline{y })}^{2}},$$
    (13)

where n is the number of samples, y is an actual output, \(\widehat{y}\) is a is forecast output, \(\overline{y }\) is the average of actual output. To conclude the training and performance evaluation processes, a flow chart has been designed which is shown in Fig. 5. The step-by-step process used in this methodology has been demonstrated in the flow chart.

Figure 5
figure 5

Flow chart of the proposed methodology to forecast evaporation using machine learning models.

Results and discussion

Estimation of monthly pan evaporation using two empirical models

As stated earlier, monthly Ep, including the temperature-based and radiation-based models, was predicted by employing two empirical models. The values pertaining to R2, MSE, MAE, RSE, RAE and RMSE are listed in Table 4, with regards to the two empirical models employed to predict Ep in Kota Bharu and Alor Setar stations. As suggested by the statistical values listed in Table 4, higher prediction accuracy was observed with the radiation-based model (Stephens and Stewart) compared with the other empirical model. In particular, the highest R2 values (0.522 and 0.599) and the least RMSE values (0.677 and 0.436) were seen in the Stephens and Stewart model. While, in the Thornthwaite model, RMSE increased by almost 14%, and the corresponding R2 decreased by around 22%. In addition, the lowest MAE (0.535 and 0.33), MSE (0.458 and 0.19), RAE (0.681 and 0.603) and RSE (0.477 and 0.4) values were observed using the Stephens and Stewart model which clearly indicates that Stephens and Stewart model outperformed Thornthwaite model. This could be due to solar radiation inclusion, which normally includes an enhancement over a strictly temperature-based estimate. In Figs. 6 and 7, predicted values pertaining to monthly Ep with regards to the two empirical models have been plotted versus the measured values for Alor Setar and Kota Bharu stations, respectively.

Table 4 Statistical results of Stephens and Stewart and Thornthwaite empirical models for prediction Ep at Alor Setar and Kota Bharu stations.
Figure 6
figure 6

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for Alor Setar station.

Figure 7
figure 7

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for Kota Bharu station.

Estimation of monthly pan evaporation using three machine learning models

Table 5 shows the statistical results pertaining to three machine learning models in order to predict monthly Ep that is placed under 10 input combinations with regards to meteorological variables for Kota Bharu and Alor Setar stations. For each of the machine learning models, the best statistical indicators have been represented in bold. As observed in Table 5, there was a considerable difference in the prediction accuracy of monthly Ep based on the input combination as well as model type. As per the statistical values, under various input combinations, with regards to the three machine learning models, the LSTM model (R2 = , 0.970, MAE = 0.135, MSE = 0.027, RMSE = 0.166, RAE = 0.173, RSE = 0.029) at the Alor Setar station and (R2 = 0.986, MAE = 0.058, MSE = 0.005, RMSE = 0.074, RAE = 0.120, RSE = 0.013) at the Kota Bharu station performed much better than the ElasticNet LR (R2 = 0.926, MAE = 0.216, MSE = 0.074, RMSE = 0.273, RAE = 0.265, RSE = 0.073) at the Alor Setar station and (R2 = 0.938, MAE = 0.103, MSE = 0.022, RMSE = 0.150, RAE = 0.218, RSE = 0.061) at the Kota Bharu station. Also, as shown in Table 5, the LSTM model performed better than XGB model (R2 = 0.913, MAE = 0.224, MSE = 0.086, RMSE = 0.294, RAE = 0.276, RSE = 0.089) at the Alor Setar station and (R2 = 0.937, MAE = 0.112, MSE = 0.023, RMSE = 0.152, RAE = 0.237, RSE = 0.062) at the Kota Bharu station. With the three machine learning models, predicted values pertaining to monthly Ep have been plotted versus the measured values for both stations as displayed in Figs. 8 and 9, respectively. The lower-level pertaining to scatter plot as well as a better fit with regards to the predicted data with that of the observed values in the 1:1 line are the obvious indicators hinting the superiority with regards to the LSTM model over the other two models.

Table 5 Statistical results (testing period) of the three machine learning models for predicting monthly Ep under 10 input combinations of meteorological variables for Alor Setar and Kota Bharu.
Figure 8
figure 8

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for Alor Setar station.

Figure 9
figure 9

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for Kota Bharu station.

As observed in Table 5, the best prediction accuracy was achieved with the models that used the complete meteorological dataset (Tmin, Tmax, Sw, Rs, Ep, RH) for both stations, versus other incomplete data input combinations. This established that the accuracy of the model prediction was enhanced with more input parameters in general, which was in line with the earlier studies3,31. Also, four input parameters that only lack of Rs or Sw were sufficient to get acceptable accuracy for monthly Ep estimation. This showed that for monthly Ep estimation in the studied region, relative humidity can be regarded as being more significant versus solar radiation and wind speed. In cases where the only available data are the average temperature data, it was seen that the machine learning models’ prediction accuracy was not sufficient even for the LSTM model (R2 = 0.741, MSE = 0.232, MAE = 0.359, RAE = 0.474, RSE = 0.258, RMSE = 0.482) at both the Kota Bharu station and Alor Setar station (R2 = 0.586, MSE = 0.173, MAE = 0.335, RSE = 0.413, RMSE = 0.416, RAE = 0.677). This suggested that the machine learning models’ prediction accuracy may not get enhanced with the powerful capabilities pertaining to artificial intelligence, when there are limited meteorological inputs. Better prediction accuracy was achieved, when minimum and maximum temperatures data were available, versus only average temperature as input. In addition, the prediction accuracy of monthly Ep was found to enhance with the implementation of RH or Rs versus the models that incorporated only minimum and maximum temperatures as inputs.

Besides, in all ML models, employing Ep as an input slightly was seen to enhance the prediction accuracy. However, with the input combination of Tmax, Tmin, Rs, Sw and RH, it was seen that the statistical values pertaining to machine learning models were near when compared with complete meteorological inputs (i.e., employing Ep as an input). This clearly indicated that the generally estimated monthly Ep values via machine learning models were in good agreement with the measured monthly Ep values. Also, the LSTM model was seen to outperform all other ML models like XGB and ElasticNet LR. This could be due to the architecture pertaining to LSTM neural network that can identify as well as account for the nonlinear behaviour associated with evaporation. The results showed that the LSTM model’s superiority versus other ML models strongly implies that the LSTM model could be employed for predicting monthly pan evaporation with high accuracy. Furthermore, as stated earlier, the application of the proposed methodology to various regions throughout Malaysia using different data sets will also be investigated with the aim of developing a reliable generalised model to predict evaporation.

Comparison of empirical and machine learning models under the same input combinations

Table 6 shows the performances pertaining to the two empirical models that predict monthly Ep, which have been further compared versus their corresponding machine learning models by employing same input combinations for Kota Bharu and Alor Setar weather stations. Under the input combination of Ta and Rs, the radiation-based model (Stephens and Stewart) provided the lowest prediction accuracy for both stations compared to all ML models under the same input combinations (R2 = 0.522, MAE = 0.535, MSE = 0.458, RMSE = 0.677, RAE = 0.681, RSE = 0.477) at the Alor Setar station and (R2 = 0.599, MAE = 0.330, MSE = 0.190, RMSE = 0.436, RAE = 0.603, RSE = 0.400) at the Kota Bharu station. On the other hand, the LSTM-6 model outperformed the other empirical and ML models and was able to obtained the highest level of prediction accuracy (R2 = 0.807, MAE = 0.341, MSE = 0.175, RMSE = 0.418, RAE = 0.446, RSE = 0.192) at the Alor Setar station and (R2 = 0.823, MAE = 0.213, MSE = 0.074, RMSE = 0.272, RAE = 0.431, RSE = 0.176) at the Kota Bharu station. In this context, although the XGB-6 slightly outperformed the LSTM-6 at Kota-Bharu station, it is obvious that this is a single case as the LSTM model is more consistent and could provide higher accuracy compared to empirical and other ML approaches under all different input combinations at both stations.

Table 6 Statistical results of the empirical and machine learning models under the same input combination for Alor Setar and Kota Bharu weather stations.

While, under the input of Ta only, the temperature-based model (Thornthwaite) had also much lower accuracy than the corresponding machine learning models with the performance measures as (R2 = 0.303, MAE = 0.635, MSE = 0.670, RMSE = 0.819, RAE = 0.811, RSE = 0.696) at the Alor Setar station and (R2 = 0.401, MAE = 0.449, MSE = 0.330, RMSE = 0.574, RAE = 0.820, RSE = 0.693) at the Kota Bharu station. On the other hand, the LSTM-1 model was predominant among the empirical and all machine learning models in both stations with the performance measures as (R2 = 0.741, MAE = 0.359, MSE = 0.232, RMSE = 0.482, RAE = 0.474, RSE = 0.258) at the Alor Setar station and (R2 = 0.568, MAE = 0.335, MSE = 0.173, RMSE = 0.416, RAE = 0.677, RSE = 0.413) at the Kota Bharu station. It is obvious from the statistical results shown in Table 6 that ML models outperformed empirical models and can significantly improve the accuracy of monthly Ep prediction even with the same inputs parameters relying on its superior capabilities to perform complex tasks.

Conclusion

The present study is aimed at evaluating the machine learning ML technique’s ability for predicting monthly Ep pertaining to two regions in Malaysia. To predict monthly evaporation, we developed three different machine learning models included Extreme Gradient Boosting, ElasticNet Linear Regression and Long Short-Term Memory, along with 10 input combinations pertaining to meteorological variables. The prediction models were tested and trained using available monthly Ep data from 2000 to 2019. The models were trained on 80% of the data and tested on 20%. The models’ accuracies were compared by accounting for standard statistical measures. The prediction accuracy pertaining to monthly Ep differed considerably relies on both the input combination and the model type. The best prediction accuracy was achieved with the models that were using complete meteorological dataset (Tmin, Tmax, Rs, RH, Sw, Ep) for both stations, versus other incomplete data input combinations. Four input parameters that lacked just Sw or Rs were found to be sufficient for the estimation of monthly Ep providing acceptable accuracy. This suggests relative humidity to be more significant when compared with solar radiation and wind speed with regards to monthly Ep estimation pertaining to the studied region. The prediction accuracy was found to be better with the available minimum and maximum temperatures data, versus with those with just the average temperature as input. Moreover, it was found that using Ep as an input slightly improve the prediction accuracy in all ML models. Comparisons were also made between developed ML models and two empirical models, one of which is radiation-based model (Stephens and Stewart) and the other is temperature-based model (Thornthwaite). It was found that the three developed ML models outperformed empirical models with the same input combinations. The performance evaluation revealed that the Long Short-Term Memory provided the most accurate monthly Ep estimates among the empirical models and other machine learning models for both Alor Setar and Kota Bharu stations. The LSTM-10 model statistical performance measures were, R2 = 0.970, MAE = 0.135, MSE = 0.027, RMSE = 0.166, RAE = 0.173, RSE = 0.029 for Alor Setar and R2 = 0.986, MAE = 0.058, MSE = 0.005, RMSE = 0.074, RAE = 0.120, RSE = 0.013 for Kota Bharu. The results showed that the LSTM model’s superiority versus other ML models strongly imply that the LSTM model could be employed for predicting monthly pan evaporation with high accuracy. Furthermore, the application of the proposed methodology to various regions throughout Malaysia using different data sets will also be investigated with the aim of developing a reliable generalised model to predict evaporation.