Introduction

Evaporation is among the vital aspects that have a pivotal role in regulating the hydrological cycle; forecasting evaporation loss is critically important for water management, irrigation planning, and agricultural models1,2,3,4. Increased evaporation rate is a significant global warming indicator5. Therefore, recording evaporation patterns is critical for monitoring and handling water resources6. Evaporation causes significant water loss, impacting water levels in lakes and reservoirs, affecting the water budget. Therefore, before designing irrigation systems and implementing water resource strategies, evaporation losses must be estimated7. Reliable evaporation forecasting is critical for hydrological and water resources, enhancement of water use, and water balance. Vapour pressure difference and heat availability affect evaporation rates; these determining factors are affected by meteorological aspects, such as solar radiation, humidity levels, wind speed, air temperature, and air pressure8,9,10. Such factors are also deeply associated with other characteristics like geographical location, seasonal influence, climate type, and time of day. Hence, evaporation is a complex phenomenon with extremely non-linear characteristics.

Evaporation estimation is conducted using indirect and direct techniques, including energy balance, water balance, mass transfer, Penman method, and evaporation pan11. The evaporation pan is an extensively used apparatus because it is inexpensive and easy to use12. Nevertheless, this is an energy-intensive process affected by numerous meteorological aspects like wind speed and vapour pressure. Moreover, pan evaporimeters cannot be deployed at every required location, specifically those where instruments cannot be installed or managed13. Indirect techniques comprise evaporation determination using meteorological information and physical concepts like volume and energy conservation that require precise adjustment based on climate. Accurately determining such meteorological variables is challenging and requires advanced instruments and skilled labour14. However, it is known that such techniques cannot offer reliable evaporation data because of intrinsic complications and the non-linear nature of the evaporation process. Considering the inadequate performance levels, such techniques have prompted scientists to develop alternative methods for determining evaporation levels15.

Literature review

Recently, AI techniques like ANN, M5 model tree (MT), support vector machines (SVM), adaptive neuro-fuzzy inference system (ANFIS), extreme learning machine (ELM), and gene expression programming (GEP) have been used to handle different water engineering and environmental issues16,17,18,19,20,21. Such AI techniques are simpler, more robust and can model complex non-linear processes without significant problems13,22,23. Extensive research has been conducted about using AI to forecast different hydrological parameters24. Researchers assert that ANN frameworks provide better forecasts than conventional methods. For example, Castellano-Méndez et al.25 contrasted the Box & Jenkins approach with ANN; the latter provides better runoff simulation performance in terms of precision.

Concerning evaporation forecasting and considering the challenges of practical and conceptual measurement techniques discussed above, several works have been performed using ML approaches with several optimisation works for forecasting pan evaporation26,27. They offered specific distinct machine learning approaches for the problem using different input sets concerning existing climatic attributes like wind speed, temperature, humidity, vapour pressure, solar radiation, and sunshine28,29. Keskin and Terzi30 used ANN and Penman models to develop evaporation models. They employed several meteorological aspects as ANN inputs. These researchers indicated that ANNs were superior to the Penman approach for evaporation forecasting. Kişi 31 formulated evolutionary neural networks to estimate pan evaporation for monthly timescales. The results indicated that the formulated models provided better accuracy than empirical methods. Deo et al.32 researched monthly water loss due to evaporation; they used three machine learning techniques: Relevance Vector Machine (RVM), Extreme Learning Machine (ELM), and Multivariate Adaptive Regression Spline (MARS). Meteorological aspects were employed as independent variables, and RVM was found to be the most effective approach among these. Sudheer et al. 22 formulated an ANN approach for modelling daily evaporation and mentioned that ANN frameworks could be effectively employed to forecast evaporation using climate data. Falamarzi et al. 33 evaluated ANN and wavelet ANN use to forecast daily evaporation. They employed temperature and wind speed data as model inputs. The results indicated that the two frameworks estimated evaporation precisely. Wang et al. 34 estimated daily evaporation using multivariate adaptive regression spline (MARS), least-square support vector regression (LSSVR), fuzzy genetic (FG), multiple linear regression (MLR), and M5 model tree (M5Tree) for eight locations near the Dongting Lake basin in China. The outcomes indicated that FG and LSSVR offer better performance and estimate evaporation with high accuracy. Malik et al. 35 estimated monthly Ep in the central Himalayan region in India using radial basis neural network (RBNN), multilayers perceptron neural network (MLPNN), self-organising map neural network (SOMNN), and co-active neuro-fuzzy inference system (CANFIS). The appropriate input set was selected using the Gamma test. The researchers found that the AI-powered technique could be employed for precise evaporation prediction. Tezel and Buyukyildiz 36 studied the applicability of MLP, RBFN, and e-support vector regression (SVR) using numerous training methods. When scaled conjugate gradient (SCG) learning was used for ANN and SVR approaches, performance was higher than empirical approaches.

Tree-based Machine learning approaches like RF have been used extensively for water and other environmental modelling during the past decade to estimate aspects like groundwater levels, streamflow, solar radiation, soil moisture, evaporation (e.g., pan and potential evapotranspiration), and suspended sediment. Such methods are relatively straightforward but potent approaches for pattern or trend detection 37,38; moreover, they offer more computationally efficient for relatively large datasets than other machine learning techniques39. Francke et al. 40 employed quantile regression forests (QRF) to estimate suspended sediment concentration for four sub-catchment areas in Spain. QRF results were contrasted against RF, generalised linear model, and the traditional sediment rating curve. The researchers found that QRF and RF are extremely flexible techniques that successfully modelled sediment dynamics. Feng et al. 41 employed the RF framework to predict daily evaporation in southwest China and contrasted it against the GRNN approach. The outcomes indicated that GRNN and RF approach provided acceptable results concerning daily evaporation; RF was marginally superior to GRNN. Recently, DL methods have been used in the machine learning domain and demonstrated success for data evaluation for natural processes, attracting attention concerning time series forecasts 42. Deep learning is a recent development in the ML paradigm that evolved from near-human to super-human performance for several engineering scenarios. In this class, forecasts are impacted by prior system characteristics; therefore, they may be used for regression and classification problems. Evaporation is intrinsically complex, dynamic, and non-linear; thus, the adaptive evaporation estimation framework must process nonlinear properties. Of the many ANN models specified in the literature, DL can process higher-order non-linear characteristics with better performance concerning time series data and its intrinsic properties for extended durations to enhance forecasting performance43. Convolutional Neural Network (CNN) has garnered extensive attention in the deep learning technique domain due to its use in several domains like object recognition44, time series categorisation45, audio signal classification46, and robotic visual and haptic data classification47, and weather forecasting48. In addition, in the noisy time series context, convolutional networks also reduce data noise and identify useful patterns by building hierarchical structures49. It must be noted that several academicians have used CNN for numerous time series prediction fields like solar energy forecast, electrical load estimation, and other scenarios.

The literature review confirmed that using ANN with appropriate learning methods can suitably model evaporation for numerous locations with superior results than relatively complex conventional approaches50. However, identifying and devising efficacious, reliable, and generalised evaporation estimation techniques is still challenging for researchers because of the intricate and non-linear nature of the evaporation process. Among the diverse ANN methods used in the recent past, the cutting-edge DL approach offers immense potential for prediction problems and has outperformed more complex methods. Because prediction is a nonlinear task, the adaptive framework for prediction ought to be nonlinear as well. With the success of DL, the CNN has become extremely advantageous for extracting characteristics from time-series data signals and thus for classification and prediction. The most important aspect of this approach is that it identifies implied recurrent sequences from the series. Moreover, such networks automatically use data to identify features without additional training or prior information. The CNN is powerful in capturing high nonlinear features among the various DL structures reported in the literature. Hence, in the current study, CNN was selected for the monthly pan evaporation forecast.

Objectives

This study is intended to assess the predictability and applicability of the CNN model in accurately estimating monthly Ep rates in four Malaysian regions using weather data for the period 2000–2019. The performance of the CNN model is compared with that of the RF as a powerful tree-based technique and with the DNN model. The models’ prediction accuracy is explored under various input combination scenarios. The proposed ML frameworks are contrasted against two widely used empirical methods, namely Thornthwaite and Stephens & Stewart, under identical input combinations. The model’s efficiency values are assessed and analysed using standard statistical performance metrics to determine their use in predicting evaporation levels. Furthermore, sufficient analysis would be performed in this study to demonstrate the reliability of the CNN model, with the goal of developing a dependable model for predicting evaporation, which is essential, specifically in water resource management and agricultural planning.

Study area and data

Study area

Malaysia is in the tropical region and receives ample rainfall. Nevertheless, development has spiked water requirements. Additionally, climate change has extended the dry season and increased the evaporation rate from reservoirs. Many consider drought a very intricate but poorly understood natural calamity, impacting people more than other hazards51; hence, predicting evaporation is vital. Therefore, this research, which aims to develop accurate models for predicting Ep, is extremely important, particularly in water resource management and agriculture. The climate monthly data from four meteorological stations situated in Bayan Lepas (longitude 100° 16′ E, latitude 5° 18′ N, elevation 2.5 m), Ipoh (longitude 101° 06′ E, latitude 4° 34′ N, elevation 40.1 m), KLIA Sepang (longitude 101° 42′ E, latitude 2° 44′ N, elevation 16.1 m), and Kuantan (longitude 103° 13′ E, latitude 3° 46′ N, elevation 15.2 m), managed by the MMD (Malaysian Meteorological Department), are utilised to calibrate and corroborate the recommended predictive models. Figure 1 depicts Malaysia’s map where the four stations are situated; Google Maps were used to create this map depicting the studied region.

Figure 1
figure 1

Location of case study [Imagery ©2021 TerraMetrics, Map data ©2021 Google].

Data description

The propositioned predictive models were built using seven meteorological indicators that include Tmax, Tmin, Ta, RH, Sw, Rs, and Ep. The data set consisted 19 years of day-to-day reports from 2000 to 2019. The statistical parameters recorded every month pertaining to the quantified meteorological data for the four above-mentioned stations are listed in Table 1. Moreover, Fig. 2 illustrates the monthly variation of every weather parameter for the duration 2000 to 2019.

Table 1 Various meteorological variables and their descriptive statistics.
Figure 2
figure 2

Monthly variations of Ep and related meteorological parameters used in this study.

In the table, the Xmin, Xmax, Xmean, Sx, Cx, and Cv represent the minimum, maximum, mean, standard deviation, skewness, and coefficient of variation of the weather parameters, respectively. It is apparent from this table data that the Ep minimum value was measured at Kuantan station, whereas the maximum value was recorded in the Bayan Lepas station. This might be due to the rate of relative humidity, which is inversely associated to evaporation; Kuantan station has the highest relative humidity rate, and Bayan Lepas station represents the lowest rate. Conversely, the coefficient of variation and maximum skewness of Ep were also measured in the Bayan Lepas station, while the minimum value was recorded in Ipoh. A positive value of skewness implies that the information is not symmetric and does not adhere to the normal distribution.

Partitioning of data and input selection

Selecting the suitable predictors is one of the most crucial steps in developing a robust predictive model52; different input combinations of meteorological parameters were examined in this study to successfully plot input–output model and improve the predictive ability of ML models. This will enable a better practical comprehension of how every input parameter affects the evaporation estimate in that region53. There are certain conscious choices for choosing these combinations. First, for the purpose of comparison, input variables to the models of machine learning (RF, DNN, and CNN) were chosen according to the required meteorological aspects in the two proposed empirical models (Thornthwaite and Stephens & Stewart). Second, the input variables (predictors) were chosen with reference to the PCC 54. The Pearson correlation method is the test statistics that quantifies the statistical correlation, or association, among two continuous parameters. It is identified as the best technique of measuring the correlation between parameters of interest since it is based on the covariance method 55. It gives data about the association or correlation magnitude and the direction of the correlation. The two parameters can be negatively or positively associated and there is no relationship among the two parameters if the PCC is 0. To show the applicable features of the environmental parameters to estimate monthly evaporation, the PCC interpretations and ranges are used as displayed in Table 2. The PCC were employed to find the meteorological parameters showing the greatest effect on the estimates of evaporation, and the results are shown in Table 3.

Table 2 Ranges and analysis of the Pearson correlation coefficient (PCC).
Table 3 Pearson correlation coefficient values between the meteorological variables measured at Bayan Lepas, Ipoh, KLIA Sepang, and Kuantan stations.

The outcomes are listed in Table 3, indicating that the Tmax, Tmin, RH, Sw, Rs were all related to a certain extent with Ep and therefore can play a crucial role in predicting the evaporation parameter for the data gathered at all stations. Particularly, at all stations, the Tmax and RH parameters have the strongest relationship with Ep. Thus, the Tmax and RH will be employed in all input combinations in order to increase the Ep estimation accuracy. Earlier studies also suggested that Tmax, Tmin, RH, Sw, and Rs are some of the most significant predictors of evaporation 56,57.

The current research has also evaluated the effects of the input parameter Ep in improving the prediction accuracy for evaporation. In this regard, the records of input data were chosen with reference to how the previous records were associated with the estimated output value. As illustrated in Fig. 3, at each of the stations, the autocorrelation examination for the recorded time series on monthly basis for the Ep rate showed that the correlation declined significantly once it went beyond the previous second lag-time record. This shows that the previous record of second evaporation rate affected the evaporation rate at any time. Therefore, based on the past pan evaporation rate records with the advantage of the correlation analysis, the highest lag times of two previous records were employed as the model input when building the proposed models on monthly basis.

Figure 3
figure 3

Partial autocorrelation for Bayan Lepas, Ipoh, KLIA Sepang and Kuantan stations (Monthly).

Accordingly, in the current study, nine different input scenarios were considered for the models (Table 4). Each climatic data set was divided into two sets, in which 80% was employed for model calibration (training) while 20% was used for validation (testing). Thus, the dataset was partitioned by taking the initial years for training and the remaining years for testing. However, the evaluation of ML approaches is extremely sensitive to the adopted data partitioning scheme. Therefore, the k-fold CV technique would be used. Despite the high computational cost associated in the CV method, it is regarded as one of the reliable prevention methods against overfitting58. The current study intends to perform a comprehensive assessment for testing AI ability and using practical models for predicting Ep levels on a monthly basis in the Bayan Lepas, Ipoh, KLIA Sepang, and Kuantan regions.

Table 4 Input combinations of meteorological variables used for ML models.

Methodology

Empirical models used for monthly Ep prediction

In this research, Stephens & Stewart and Thornthwaite were selected for comparison as the two empirical techniques, as they are regarded to be widely employed methods59, taking into account the number of meteorological inputs required as well as the availability of the data.

Stephens & Stewart

This technique is also commonly referred as the ‘Fractional Evaporation-Equivalent of Solar Energy’ approach by Stephens & Stewart60. As presented in Eq. (1), Stephens & Stewart suggested that by employing measured radiation Qs, better results were achieved when there is availability of data and it also allows correlating with temperature:

$$Ep= \left(0.0082Ta-0.19\right)\left(\frac{Qs}{1500}\right) \times 25.4,$$
(1)

where \(Ta\), \(Ep\), and \(Qs\) represent mean air temperature (Fahrenheit), evaporation (mm), and solar radiation (cal cm−2 day−1). Stephens & Stewart also recommended carrying out additional research in other regions to set such relationships under different weather conditions.

Thornthwaite

Thornthwaite61 employed practical data to identify the relationship that exists between mean monthly temperature (Ta) and probable evaporation (Ep), and then set standardisation to a 30-day month with 12 h of sunlight each day. The potential evaporation (Ep) is calculated by employing Thornthwaite technique; the following expression is employed to calculate the Monthly Thornthwaite Heat Index (\(I\)):

$$i={\left(\frac{Ta}{5}\right)}^{1.514},$$
(2)

where \(Ta\) represent mean monthly temperature (°C).

The Annual heat index \(\left(I\right)\) is calculated as the sum of the Monthly Heat Indices \(\left(i\right)\):

$$I=\sum_{i=1}^{12} i.$$
(3)

The potential evaporation \(Ep\) for each month is calculated using the following equation:

$$Ep=16 \cdot {\left(\frac{10 \cdot Ta}{I}\right)}^{a},$$
(4)

where \(a\) is:

$$a=\left(675 \times {10}^{-9} \times {I}^{3}\right)-\left(771 \times {10}^{-7}\times {I}^{3}\right)-\left(1792\times {10}^{-5}\times I\right)+0.49239.$$
(5)

\(Ep\) for a given month is given by the expression:

$$Ep={Ep}_{Obtained} \cdot \frac{N}{12} \cdot \frac{d}{30}\,\, \left(\mathrm{mm}\right).$$
(6)

N and d denote the number of theoretical monthly sunshine hours and days in the month, respectively.

ML models used for monthly Ep prediction

Three ML frameworks were included in the current study to estimate evaporation, i.e., RF, DNN, and CNN. The TensorFlow framework geared with an NVIDIA GeForce GTX 1080 Ti GPU was employed to conduct training and testing of the machine learning models.

Random Forest (RF)

The Random Forest algorithm is an effective tree-based ensemble learning algorithm, which is known for its excellent performance. It has a broad range of applications, including regression, classification as well as unsupervised learning62. The RFs model was put forward by Breiman63, which employed Breiman’s ‘bagging’ idea to ensemble a set of decision trees that possess controlled variation. The data set excluded in the development of the model signified as out-of-bag (OOB) samples is used to assess the general problems (Fig. 4). This also offers a quantitative measurement pertaining to contribution of each input auxiliary data towards the prediction step, referred as RF variable importance64. The functioning of Random Forest algorithm in general follows these steps: (i) collect and then re-sample the original training data several times; (ii) select a random set of features for every re-sampling step; (iii) estimate a decision tree based on a re-sample and a random set of features; (iv) to obtain a single decision tree, a set of estimated decision trees is gathered. It can be noted that RF is rather insensitive towards noise as well as overtraining, It has been broadly employed to solve complicated as well as non-linear hydrological engineering issues 65,66. Additional details about the random forest model theories can be noted in 63.

Figure 4
figure 4

General architecture of the RFs model.

In this study, different hyperparameters were employed in RF in order to determine the best ones that can achieve the highest accuracy with regards to prediction, such as:

  1. 1.

    The total number of trees needed to generate the forest (Ntree) This parameter is regarded to be a determinant factor when it comes to conducting predictions with RF.

  2. 2.

    The tree’s maximum depth With regards to Random Forest, the maximum depth of a tree refers to the longest path between the leaf node and the root node.

  3. 3.

    To identify the best split, the following features need to be kept in mind:

    • max_features {“auto”, “sqrt”, “log2”}.

    • If “auto”, then max_features = n_features.

    • If “sqrt”, then max_features = sqrt (n_features).

    • If “log2”, then max_features = log2 (n_features).

Deep neural network (DNN)

In the deep learning field, DNN are regarded to be a key technique42. The fundamental framework has been built by considering the brain’s functioning and biological structure to enable machines to achieve intelligence that is more human-like. The basic version pertaining to DNN represents a hierarchical collection of neurons that transmit messages to other neurons as per the input, thus resulting in the development of a complex network learning based on the feedback mechanism. Figure 5 shows the typical structure pertaining to DNN, which includes one input layer, one output layer and numerous hidden layers. As shown in Fig. 5, the balls denote the neurons, wherein each link that exists between neurons is represented by a cause-effect chain that can be trained and learned. The layers remain fully connected, in which any particular neuron in one-layer stays connected to each of the neuron in the next layer. The entire DNN model is made up of a linear function outlined in Eq. (7) as well as an activation function as shown below:

$$a= \sum {w}_{i}{x}_{i}+ {b}_{i},$$
(7)

where \({x}_{i}\) represents the input value pertaining to each neuron; \({w}_{i}\) denotes the coefficient pertaining to linear relationship and \({b}_{i}\) defines the bias. Presuming there are L hidden layers with regards to the DNN, the output value calculation can be represented as follows:

$$f\left(x\right)=f\left[{a}^{L+1} ({h}^{L}\left({a}^{L} \left( \ldots \left({h}^{2}\left({a}^{2}\left({h}^{1}\left({a}^{1} \left(x\right) \right) \right) \right) \right) \right) \right)\right] {a}^{L}\left(x\right)= {W}^{L}+b,$$
(8)

where \(L\) denotes the \(Lth\) layer; \(x\) signifies the matrix of input variables; \(b\) and \(W\) indicate high dimensional matrix and \(f(x)\) indicates the introduced activation function to boost the nonlinearity pertaining to the neural network in order to approximate any nonlinear function with regards to numerous nonlinear models. Amongst all of these activation functions, the rectified linear unit (ReLU) activation function, i.e. ReLU(\(x\)) = max(\(x\), 0), has now become the most popular activation functions employed in the deep learning literature as well as applications 67.

Figure 5
figure 5

The basic structure of DNN.

Determination of the values of \(W\) and \(b\) is determined automatically by taking into account the minimum value pertaining to the loss function in the training process. The difference that exists between the actual and predicted values is determined by employing the loss function. The model’s robustness gets better when there is a smaller value of loss function. Finally, the output layer is regarded to be the final layer of the network. In this research work, testing of different hyperparameters is done to choose the best architecture that can offer the highest evaluation metrics that will help determine the DNN's optimal structure. The hyperparameters include: (1) The total number of fully connected layers, (2) kinds of activation functions that exist amongst layers, (3) percentage of dropout as well as number of dropout layers, (4) loss function, (5) batch size, (6) optimiser, (7) number of epochs and (8) Learning rate. The put forward DNN model’s best architecture with regards to prediction of evaporation includes the following layers:

  1. (1)

    Fully connected layers with 64 nodes and ReLU activation function.

  2. (2)

    Dropout with 0.1%.

  3. (3)

    Fully connected layers with 128 nodes and ReLU activation function.

  4. (4)

    Dropout with 0.1%.

  5. (5)

    Fully connected layers with 1 node and Linear activation function.

The final hyperparameters are:

  1. 1.

    The learning rate: 0.001.

  2. 2.

    Loss Function: Mean Square Error (MSE).

  3. 3.

    Optimizer: ADAM.

  4. 4.

    Epochs: 500.

  5. 5.

    Batch size: 8.

Convolutional neural network (CNN)

CNN is a renowned and extensively utilised deep learning structure. First recommended by LeCun et al. 68, CNNs are still a broadly deployed model for image processing and examination due to their capability to mine and decompose features and secure spatial correlations between data in one or two dimensions69. Convolutional neural network usually pertains to a 2-dimensional CNN, which is typically utilised for image classification. There are other kinds of CNNs like 1-dimensional (1D-CNN) and 3-dimensional (3D-CNN) which are also utilised in real-life engineering applications. Notably, all CNNs possess the same attributes and follow the same methodology. However, the key dissimilarity is the input data dimensionality and the way the filter (feature detector) moves over the data. In this work, we utilised 1D-CNN for pan evaporation estimation because of its advanced performance and minimal computational intricacy. CNNs comprises two key parts70: the first comprises convolutional filtering for mining attributes hierarchically and the second is a fully-connected layer for computing the output value from manifold input values comprising fully-connected neuron layers. The fully connected layers are quite similar to the multilayer perceptron (MLP) layers. The MLP is a feed forward neural network which utilises stochastic gradient descent backpropagation algorithmic for network optimisation. In fact, ordinary artificial neural networks (ANNs) solely comprise the second part; thus, the feature extraction stage is the key difference between CNNs and normal ANNs.

The CNN design generally encompasses an input layer, an output layer and few random numbers of hidden layers among them. A typical CNN setup is depicted in Fig. 6. The input layer is responsible for receiving the signal (input data) as well as transmitting it to the hidden layer(s). Hidden layers can be defined as the computational engine pertaining to the model. These could include one or more dropout layer, convolutional 1D layer, max-pooling layer as well as a flatten layer based on the problem. The CNN’s chief building block is the convolutional layer that includes one-dimensional filters/kernels that enable extracting the features via the input signal, an activation function for establishing neurons' threshold limit and kernel size to denote the filter length. There are many commonly utilised activations functions like the ReLU, tanH, Softmax, and Sigmoid. Each of these have a particular use. The hidden information in the input data can be identified and excerpted via convolutional filters. Towards the end of the convolution layers, the learning features are generally flattened to a single long vector array and tend to pass via fully connected layers prior to employing the output layer for prediction. The flatten layer transforms the convolutional/pooling/dropout layers’ output to one dimension and then transmits the data to the output layer. To the neurons in the network, the dropout layer (should it be employed) randomly assigns zero weights, making it less sensitive to minor variation, thereby enhancing the model’s accuracy regarding unseen data. The 1D-CNN’s last layer would be the output layer that contains one neuron for yielding the desired output. To summarise, there exist three kinds of layers which constitute the CNN: the convolutional layers, fully connected (FC) layers, and pooling layers. Once these layers are arranged, a CNN architecture would be created.

Figure 6
figure 6

General architecture of the CNN model.

In this work, many meteorological variables, such as Tmax, Tmin, RH, Sw, and Rs were applied to CNN to estimate the pan evaporation rate. Iterative parameter tuning helped CNN fit the dataset. To determine the precise CNN structure, several hyperparameters were evaluated to determine the optimal structure to offer the most precise assessment metrics. These hyperparameters comprise convolutional layer count, layer-specific feature map count, filter size, pooling layer category, activation function categories between layers, dropout percentage and numbers, fully-connected layer count, loss function, learning rate, epoch count, batch size, and optimiser. Typically, CNN is built using dense and convolutional layers. Pooling layers might be included in such networks; the layers are inserted between convolutional layers to decrease problem dimensions and identify critical features. Nevertheless, this study does not consider pooling layers because excess parameter count is tolerable for time series forecasts, and recent studies are critical about the need for pooling layers71. Moreover, researchers assert that adequately sized convolutional layers suffice for networking function without adding additional layers71. The sequential model is typically used for Python programming, and it was used for this step too. It provides a straightforward technique to create a CNN structure using Keras since it facilitates building the structure based on layers. During CNN training, the objective is to optimise the loss function representing the objective in the neural network structure. The function is based on MSE. This study also employed the dropout technique to reduce overfitting. Dropout is a widely-used regularisation technique (creative a more representative CNN weight range by creating a new scale), and the values were 0.1 and 0.2. A batch size of 16 and 500 epochs were chosen for training the model based on the above architectural configuration and several trials. Adam algorithm72 was employed to adjust network weights to reduce loss function and determine network performance with a learning rate of 0.001 and momentum rate of 0.7.

The one-dimensional CNN structure proposed in this study comprises the following layers for optimal performance concerning evaporation prediction:

  1. (1)

    CNN with one convolutional layer and 32 filter with kernel_size = 2 and activation = ‘relu’.

  2. (2)

    Dropout with 0.2%.

  3. (3)

    Flatten layer (used as a connection between Convolution and the Dense layers).

  4. (4)

    Fully connected layers with 128 nodes and ReLU activation function.

  5. (5)

    Dropout with 0.1%.

  6. (6)

    Fully connected layers with 256 nodes and ReLU activation function.

  7. (7)

    Dropout with 0.1%.

  8. (8)

    Fully connected layers with 1 node and Linear activation function.

The final hyperparameters are:

  1. 1.

    Learning rate: 0.001.

  2. 2.

    Loss function: MSE.

  3. 3.

    Optimizer: ADAM.

  4. 4.

    Epochs: 500.

  5. 5.

    Batch size: 16.

Performance evaluation

Choosing the appropriate performance indicators is crucial since every indicator has its own properties. In addition, knowing the strengths of each statistical measure can provide a better understanding of how the model perform. Therefore, in this study, model predictive performance was evaluated by utilising numerous well-known statistical indicators. These indicators are defined below:

  1. (1)

    R2 the coefficient of determination informs the correlation between the real and estimated outputs; it has a value range of 0–1 (both limits included). Zero indicates a random framework, while one represents optimal fit. R2 is very popular and makes comparing models easier and more consistent. It attempts to measure how well a regression model is fit a dataset, providing evaluators with an instant understanding of the model’s performance.

    $${R}^{2}= \frac{\sum_{i=1}^{n}\left(y- \overline{y }\right) (\widehat{y}- \overline{\widehat{y} })}{\sqrt{\sum_{i=1}^{n}{(y- \overline{y })}^{2 } } \sum_{i=1}^{n}{(\widehat{y}- \overline{\widehat{y} })}^{2} }.$$
    (9)
  2. (2)

    MAE the absolute difference between the actual and predicted output. High errors caused by outliers are not penalised by MAE. Furthermore, it provides a consistent indicator of how precise the model performs.

    $$MAE= \frac{1}{n}\sum_{i=1}^{n}\left|y- \widehat{y}\right|.$$
    (10)
  3. (3)

    MSE the average squared difference between predicted and actual output. By squaring the errors, the MSE penalises the model for having large errors. Furthermore, for minor errors, it efficiently converges to the minima.

    $$MSE= \frac{1}{n} {\sum_{i=1}^{n}{(y- \widehat{y})}^{2}}.$$
    (11)
  4. (4)

    RMSE it is the square root of the average value of error squares concerning the real and estimated values. In assessing the performance of a regression model, RMSE is more commonly used than MSE. In addition, RMSE is straightforward and easily distinguishable. RMSE has the added benefit of penalising large errors, making it more acceptable.

    $$RMSE=\sqrt{\frac{{\sum_{i=1}^{n}(y- \widehat{y})}^{2}}{n}}.$$
    (12)
  5. (5)

    RAE the difference between real and forecasted values are gathered and normalised. RAE is reliable in some cases because it protects against outliers.

    $$RAE= \frac{\sum_{i=1}^{n}\left|y- \widehat{y}\right|}{\sum_{i=1}^{n}\left|y- \overline{y }\right|}.$$
    (13)
  6. (6)

    NSE it represents a normalised metric determining the relative intensively of residual variance (noise) when determined against the calculated variance (information). The NSE is still widely used in hydrologic modelling, in part since it normalises performance of the model into an understandable scale.

    $$NSE=1- \frac{\sum_{i=1}^{n}{\left(y- \widehat{y}\right)}^{2}}{\sum_{i=1}^{n}{\left(y- \overline{y }\right)}^{2}},$$
    (14)

    where n is sample count, y denotes the true output, \(\widehat{y}\) denotes the predicted values, and \(\overline{y }\) is the true output average.

  7. (7)

    Taylor diagram (TD) Besides the above-mentioned statistical factors, Taylor diagram73 was also used to calculate the accuracy of the modelling methods taken into consideration and their extent of similarity. The diagram is normally used in climate-based studies74. These diagrams can underline the accuracy of models’ estimates by comparing the predicted and measured values by visualising a series of elements on a polar plot. The diagram’s azimuth angle illustrates the correlation coefficient between the predicted and measured values, whereas the standard deviation value of the modelled data from observations is shown by the radial distance from the origin.

As a conclusion to the performance and training evaluation procedures for the ML models that are proposed, a flow chart is devised which is displayed in Fig. 7. The detailed procedure employed in this approach has been illustrated in the flow chart.

Figure 7
figure 7

The process of developing a prediction model.

Results and discussion

Estimation of monthly Ep using empirical models

As previously mentioned, monthly Ep was estimated using two empirical models, which include radiation-based and temperature-based models. The values relating to R2, MSE, MAE, NSE, RAE and RMSE are recorded in Table 5, with respect to the two models used to estimate Ep in Bayan Lepas, Ipoh, KLIA Sepang and Kuantan stations. As indicated by the statistical values shown in Table 5, greater prediction accuracy was noticed with the model based on radiation (Stephens & Stewart) in comparison with the temperature-based model. Above all, the highest R2 values (0.620, 0.649, 0.580, and 0.696) and the minimum RMSE values (0.409, 0.292, 0.314, and 0.292) were observed in Stephens & Stewart model for all stations. However, in the Thornthwaite model, values of RMSE increased by approximately average 16%, and the corresponding R2 reduced by approximately average 33%. The performance values listed in Table 5 clearly suggest that the Stephens & Stewart model surpassed the Thornthwaite model. It could be due to the inclusion of solar radiation, which generally includes an improvement over only the temperature-based estimation53. In Figs. 8, 9, 10 and 11, projected values related to monthly Ep with respect to both the empirical models are plotted against the values measured at stations Bayan Lepas, Ipoh, KLIA Sepang and Kuantan, respectively.

Table 5 Statistical results of Stephens & Stewart and Thornthwaite empirical models for prediction Ep at Bayan Lepas, Ipoh, KLIA Sepang and Kuantan stations.
Figure 8
figure 8

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for Bayan Lepas station.

Figure 9
figure 9

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for Ipoh station.

Figure 10
figure 10

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for KLIA Sepang station.

Figure 11
figure 11

Scatter plot of measured Ep versus predicted Ep for the proposed empirical modles for Kuantan station.

Estimation of monthly Ep using ML models

Table 6 displays the statistical outcomes related to three ML models with the aim to estimate monthly Ep using nine input combinations with respect to meteorological parameters for Bayan Lepas, Ipoh, KLIA Sepang and Kuantan stations. For every ML model, the optimum statistical parameters have been shown in bold. As can be seen in Table 6, there is a noteworthy difference between the estimation accuracy of monthly Ep based on model type and input combination. According to the statistical values, for different input combinations, with respect to the three machine learning models, the CNN-9 model (R2 = 0.970, MAE = 0.071, MSE = 0.008, RMSE = 0.092, RAE = 0.138, NSE = 0.980) at the Bayan Lepas station, (R2 = 0.980, MAE = 0.053, MSE = 0.004, RMSE = 0.069, RAE = 0.132, NSE = 0.981) at the Ipoh station, (R2 = 0.965, MAE = 0.079, MSE = 0.008, RMSE = 0.091, RAE = 0.214, NSE = 0.966) at the KLIA Sepang station, and (R2 = 0.962, MAE = 0.084, MSE = 0.010, RMSE = 0.103, RAE = 0.198, NSE = 0.962) at the Kuantan station offered better performance than the DNN and RF models. In addition, as previously stated, the k-fold CV technique has been used. Cross-validation is a reliable method for preventing overfitting. The primary configuration variable for k-fold CV is k, which defines how many folds the dataset will be split into. Hence, as shown in Table 7, different folds (3, 5, and 10) were used in this study. When these k-fold testing values are compared, it is possible to conclude that the CNN model provides the most accurate results with k = 5 for all stations. With the three ML models, estimated values relating to monthly Ep have been plotted against the measured values for each station as shown in Figs. 12, 13, 14 and 15. The lower-level pertaining to scatter plot and an improved fit with respect to the estimated data with that of the values observed in the 1:1 line are the clear indicators suggesting the superiority with respect to the CNN model compared to other models. Even though Figs. 12, 13, 14 and 15 as well as Table 6 display the observed and estimated values for all the models, and also the evaluation criteria, the Taylor diagram (TD) was employed to compare the methods presented in this research. The primary concept of the TD is to represent the closest prediction model with actual corresponding observation in the 2-D scaling (correlation coefficient on polar axis and standard deviation on radial axis). Standard deviation is with respect to how much, on average, measurements vary from each other. Thus, the relative value of SDP from SDA indicates the level of accuracy. The value of SDP from S.D.A. pertains to lower accuracy. Greater difference refers to lower precision. Therefore, in Fig. 16, it can be noticed that the CNN-9 was better compared to other methodologies, which had SD of 0.65 closer to the actual SD of 0.66 in Bayan Lepas, SD of 0.47 to the actual SD of 0.49 in Ipoh, SD of 0.47 to the actual SD of 0.48 in KLIA Sepang, and SD of 0.52 to the actual SD of 0.53 in Kuantan. The comparison of predicted and actual Ep monthly values generated by the most exact models is displayed in Fig. 16, which demonstrated that the ML models are superior to other models generally, while the CNN-9 is superior to the ML models in particular.

Table 6 Statistical results (testing period) of the three machine learning models for predicting monthly Ep under nine input combinations of meteorological variables for Bayan Lepas, Ipoh, KLIA Sepang and Kuantan stations.
Table 7 Time series cross-validation.
Figure 12
figure 12

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for Bayan Lepas station.

Figure 13
figure 13

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for Ipoh station.

Figure 14
figure 14

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for KLIA Sepang station.

Figure 15
figure 15

Scatter plot of measured Ep versus predicted Ep for the proposed machine learning models for Kuantan station.

Figure 16
figure 16

Taylor diagram of predicted monthly pan evaporation during the validation stage.

As per Table 6, realisation of the best prediction accuracy was possible through the models employing the complete meteorological dataset (Tmax, Tmin, Rs, Sw, RH and Ep) with regards to all stations, when compared with combinations pertaining to other incomplete data input. This showed that the model prediction’s accuracy improved in general with additional input parameters, which was similar to the results seen in the earlier studies3,34. Four input parameters that have not included Rs or Sw were adequate to achieve acceptable accuracy with regards to estimation of monthly Ep. When only mean temperature data were available, ML models, including the CNN model, were found to be insufficient for all stations. This implied that employing the powerful capabilities, such as AI may not improve the ML model prediction accuracy, particularly when meteorological inputs are restricted. Besides, with regards to all ML models, the prediction accuracy improved slightly by using Ep as an input. However, the statistical values with regards to machine learning models were close to complete meteorological inputs (i.e., using Ep as an input) by employing the input combination pertaining to Tmin, Tmax, Sw, Rs and RH. This suggested that the estimated monthly Ep values through machine learning models were in general in line with those of the measured monthly Ep values.

Apart from the robustness and convenience associated with DL’s automated feature extraction, it was seen that the proposed deep learning models consistently outdid the RF model when it comes to prediction of Ep. Thus, these research results were in line with the previous studies53,75, which mentioned deep learning to be a powerful modelling technique that allows learning the complex and non-linear behaviours pertaining to evaporation. Particularly, it was seen that the CNN model was better than other DLs models, such as DNN, which indicates the CNN model’s high potential when it comes to modelling and mapping evaporation when it is difficult for most of the ML models. The effectiveness pertaining to CNN in capturing and analysing the non-linearity and complexity behaviours of evaporation with greater efficacy could be due to the convolutional characteristic of 1D-CNN, i.e., a large number of convolutional kernels are applied by CNN to the inputs for extracting information extensively, which is helpful for time series forecast. However, DLs versus RFs need to be compared carefully, since there is a chance of underestimating the capacity of RFs when special consideration is not given. Thus, the time needed to run and tune the models also needs to be considered when objectively comparing between DL and RF models. Although training time can be influenced by several factors (e.g., model complexity, number of inputs employed), in general, RF has been found to be faster in tuning and training versus DL. The application of DL includes training time as one of the challenges. In addition to this, it is challenging to optimise DL since no formula has been identified that can guarantee converging of DL to a good solution. Moreover, when compared with the RF, larger data sets are required for DL to learn the evaporation properties. Due to this, even though deep learning is regarded to be very powerful when it comes to capturing complex and non-linear behaviours, there exist certain challenges that need to be taken into account when constructing deep learning prediction models.

With regards to the above statement, the CNN model was seen to be able to model pan evaporation with high prediction accuracy. However, for validating the developed predictive model's predictability, a comparison was performed for the results pertaining to the current study versus other AI models exposed to same climatic conditions. Mustafa et al.76 reported an R2 value of 0.97 with regards to their best-performing SVM model during validation period by employing the Support Vector Machine (SVM) method in the Ipoh region based on the same data as used in the current research, versus an R2 value of 0.98 that was identified in the current study. It was also seen that the CNN model was better compared to other AI methods, including K-Nearest Neighbours (KNN), which was recently used in the Ipoh region based on the same data as employed in the present study (M.A, M.A.I, A.N.A, and Y.F.H). Satisfactory performance was reported by applying the KNN, which gave an R2 value of 0.94. Based on this, the study concluded that DL in general, and CNN in particular, can be used as optimistic predictive models in hydrological applications such as evaporation due to the excellent features described earlier. Moreover, investigation will be carried out with regards to the application of the proposed methodology for different regions throughout Malaysia by employing different data sets in order to construct a reliable generalised model for evaporation prediction.

Comparison of empirical and ML models

Table 8 demonstrates the performances for two empirical models to perform prediction of monthly Ep, which are then compared to their respective ML models using same input combinations for Bayan Lepas, Ipoh, KLIA Sepang and Kuantan weather stations. As an initial observation, with regards to input combination of Rs and Ta for all stations, the radiation-based model (Stewart and Stephens) offered the lowest prediction accuracy (R2 values: 0.620, 0.649, 0.580, and 0.696) in comparison with all ML models. On the other hand, the machine learning models (i.e., RF-1, CNN-1 and DNN-1) were seen to perform excellently to achieve high prediction accuracy versus the temperature-based model (Thornthwaite) based on the input combination of just Ta. Based on the statistical results presented in Table 7, the higher performance of ML models was evident versus empirical models, and could also considerably enhance the prediction accuracy of monthly Ep even when employing the same input parameters, depending on their superior capabilities to carry out non-linear and complex tasks. Furthermore, it has been seen that higher accuracy was achieved with the deep learning models (i.e., DNN and CNN) in terms of forecasting evaporation versus the tree-based model (i.e., RF). This can be attributed to the deep learning feature catching concealed properties, which signifies that deep learning can be regarded as more powerful approach for predicting evaporation. In this regard, although the RF was seen to marginally outperform the DL models for few cases, it is evident that this is a single case since the DL models are regarded to be more consistent and could also offer higher accuracy versus empirical and tree-based methods based on all the different input sets at all stations.

Table 8 Statistical results of the empirical and machine learning models under the same input combination for Bayan Lepas, Ipoh, KLIA Sepang and Kuantan weather stations.

Conclusion

This study is conducted to determine the monthly Ep losses by employing RF, DNN, and CNN techniques. Monthly data from four weather stations in Malaysia were employed to assess the capabilities of the three AI approaches in predicting the Ep rates. Time series data pertaining to monthly Ep, such as Tmax, Tmin, Ta, RH, Sw, Rs, and Ep, between the years 2000–2019 were used to set up the evaluated models. The data was divided into two parts: 20% for testing (validation) and 80% for training (calibration). The PCC values were used to select the input parameters (predictors) in order to identify the most effective input combinations for ML models. The developed ML models were compared to two empirical models, one is temperature-based model (Thornthwaite) while the other is radiation-based model (Stephens & Stewart). Standard statistical measures were employed to assess the performance of each model as well as their effectiveness pertaining to evaporation forecasting. Furthermore, the accuracy of the studied models was evaluated using the Taylor diagram. The investigation yielded the following results:

  • The three developed ML models were found to outperform the empirical methods and to significantly improve the precision of monthly Ep estimates even when using the same combinations of inputs.

  • Both RF and DL methods can accurately predict the monthly Ep. In particular, when it comes to predicting Ep, the DL approach (i.e., CNN and DNN) was found to slightly outperform the RF model.

  • The best ML prediction accuracy could be achieved with models that employed complete meteorological datasets (Tmax, Tmin, Rs, Sw, RH and Ep) with regards to all stations, when compared with other combinations of incomplete data input.

  • As seen in the results, the monthly evaporation losses can be successfully modelled based on the CNN structure along with enhanced accuracy versus other models that were accounted in this study. Moreover, estimation results based on the CNN model were seen to outdo versus other AI approaches that were studied in the same regions by employing the same data.

  • In the future, the applicability of the proposed methodology to different regions in Malaysia can be assessed using different data sets with the aim of building a dependable generalised model for predicting evaporation.