Forecasting standardized precipitation index using data intelligence models: regional investigation of Bangladesh

A noticeable increase in drought frequency and severity has been observed across the globe due to climate change, which attracted scientists in development of drought prediction models for mitigation of impacts. Droughts are usually monitored using drought indices (DIs), most of which are probabilistic and therefore, highly stochastic and non-linear. The current research investigated the capability of different versions of relatively well-explored machine learning (ML) models including random forest (RF), minimum probability machine regression (MPMR), M5 Tree (M5tree), extreme learning machine (ELM) and online sequential-ELM (OSELM) in predicting the most widely used DI known as standardized precipitation index (SPI) at multiple month horizons (i.e., 1, 3, 6 and 12). Models were developed using monthly rainfall data for the period of 1949–2013 at four meteorological stations namely, Barisal, Bogra, Faridpur and Mymensingh, each representing a geographical region of Bangladesh which frequently experiences droughts. The model inputs were decided based on correlation statistics and the prediction capability was evaluated using several statistical metrics including mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R), Willmott’s Index of agreement (WI), Nash Sutcliffe efficiency (NSE), and Legates and McCabe Index (LM). The results revealed that the proposed models are reliable and robust in predicting droughts in the region. Comparison of the models revealed ELM as the best model in forecasting droughts with minimal RMSE in the range of 0.07–0.85, 0.08–0.76, 0.062–0.80 and 0.042–0.605 for Barisal, Bogra, Faridpur and Mymensingh, respectively for all the SPI scales except one-month SPI for which the RF showed the best performance with minimal RMSE of 0.57, 0.45, 0.59 and 0.42, respectively.

www.nature.com/scientificreports/ there is very less variability in annual mean temperature among different geographical regions. The rainfall in Bangladesh ranges between 1600 and 4400 mm in the northwest and northeast respectively. Seasonal and annual variability of rainfall is very high. About 75% to annual rainfall occurs in monsoon months of May to September and only 3% rainfall occurs during December-February. The coefficient of annual variability of monsoon rainfall in more than 30% in a major portion of the country. The high variability of rainfall often causes droughts in the country. The country experienced major droughts in the years, 1963, 1966, 1968, 1973, 1977, 1979, 1982, 1989, 1992 and 1994-1995. Theoretical overview and SPI calculation. The review of ML models used in this study, RF, MPMR, M5tree, ELM and OSELM are provided in this section.
Random Forest (RF). The RF is created based on the concept of ensemble and bagging learning approach 55 . It uses the decision tree methodology which performs the bagging procedure for solving the regression problem 55,56 . Each node in RF is separated randomly by selecting the most important input predictors to enhance the learning process that leads to better prediction accuracy as well as maintaining the robustness to avoid overfitting 57 . The steps are followed to construct an RF model: i. Select random k data points from training data. ii. Construct the decision tree associated with the data in (i). iii. Choose the n-decision tree (n trees ) that needs to build. iv. Repeat i and ii. v. Cumulate the aggregative predictions of n trees to forecast multi-scaler SPI. www.nature.com/scientificreports/ www.nature.com/scientificreports/ The capacity of the RF has been approved in modeling different phenomena in atmospheric, hydrological and geosciences engineering 58 , environmental management 59 , drought forecasting 60 , rainfall forecasting 61 , solar index estimation 62 and most recently forecasting soil moisture 63 .
For more comprehensive studies on RF model, readers are referred to 57,[64][65][66] . The flowchart of the random forest model is provided in Fig. 2a.   67 . ELM is very fast and more efficient than the existing data-driven models 68 . Mathematically the ELM can be formulated as 69 : where ρ i = [ρ 1 , ρ 2 , . . . , ρ M ] T is the output weight vector between the hidden layer of M nodes to the m ≥ 1 output nodes, and h i ( T is ELM nonlinear feature mapping and f M (x) is the final ouptput/prediction. For example, the output (row) vector of the hidden layer with respect to the input x. h i (x) is the output of the ith hidden node output. The output functions of hidden nodes may not be unique. Different output functions may be used in different hidden neurons. In real life problems h i (x i ) can be written as: where G(a, b, x) (with hidden node parameters (a, b)) is a nonlinear piecewise continuous function satisfying ELM universal approximation capability theorems and R is the set of real numbers whereas R d is the d-dimensional set of real numbers and x i is the input data. The commonly used mapping function/activation function in ELM are Sigmoid function, Hyperbolic tangent function, Gaussian function, Hard limit function, Cosine function and Fourier basis functions. ELM trains an SLFN in random feature mapping and linear parameters solving phases. First ELM randomly adjusts the hidden layer to map the input predictor into a feature space with the help of some nonlinear functions. The random feature phase differentiates ELM from SVM and deep neural networks. The nonlinear activation functions are basically nonlinear piecewise continuous functions. The hidden node parameters (a, b) in ELM are randomly created which are independent of the training data.
In the second phase of ELM learning, the weights connecting the hidden layer and the output layer, represented by ρ , are solved by reducing the approximation error in the squared error sense: where H is the hidden layer output matrix which can be simplified as follows 69 . and T is the training data matrix, which can be written as: The ∥ · ∥ denotes the Frobenius norm. The optimum solution to (3) is given by: Here H + is the Moore-Penrose generalized inverse of matrix of H. The principle which differentiates ELM from the conventional neural network model is that every parameter of the feed-forward networks (input weights and hidden layer biases) is not necessary to be fine-tuned. The SLFNs with randomly selected input weights effectively learn different training patterns with minimum error. Following randomly selecting input weights and the hidden layer biases, SLFNs can be deemed as a linear system. The output weights which connect the hidden layer to the output layer of this linear system can now be systematically solved by generalized inverse operation of the hidden layer output matrices. This makes ELM model many times faster than that of conventional feedforward learning algorithms. The flowchart of ELM model is shown in Fig. 2b.
The online sequential extreme learning machine (OSELM). A standalone ELM which uses all N-samples data for the training. However, the data chunk-by-chunk may be used in solving real-world complexity because the process of learning is a very time consuming and requires new data for training ELM each time the model is run 70 . The OSELM performs in two learning stages as the variant of a standalone ELM model, i.e., a sequential learning stage and initialization stage. In the initialization stage, for a given training dataset ℵ k−1 : where x j is the input data point and t j is the jth parameter. The initial output weight is given by: where ρ k−1 is the initial output weight, θ k−1 = H t k−1 H k−1 −1 is the Moore-Penrose generalized inverse of matrix, H k−1 = h t 1 , . . . , h t k−1 t is the hidden layer output matrix, and T k−1 = [t 1 , . . . , t k−1 ] t is the training data matrix.
(1) www.nature.com/scientificreports/ The biases and random weights are assigned to the small chunk in the initialization stage to compute the hidden layer output matrix in the initial SPI (W) training data. The sequential learning phase is then initiated where RLS algorithm is employed to update the output weights in a recursive way 70 . The output weights in OSELM are recursively updated based on the intermediate results in the last iteration and the newly arrived data, which is discarded instantly once they have been learnt, and therefore, the calculation overhead and the memory requirement of the algorithm are significantly decreased. The readers can consult literature for further details on OS-ELM [71][72][73] .
The M5 tree (M5tree) model. The M5tree model works on the binary decision tree structure, is an ordered and hierarchical model 74 . The connection is initiated between inputs and output at the terminal nodes using linear regression 75 . Tree-based models are made according to a divide-and-conquer method for establishing a relationship between the inputs and output 76 . Two steps are involved to build the M5tree model i.e. In the first step, the data is partitioned into subsets to create model tree which is based on standard deviation that reaches a node as a measure of error and determining the expected decrease in the error as a result of testing each attribute 77 . The method is recursive in which the data points are divided into subsets similar to a test that depends on the standard deviation and the error depletion R as given 77,78 : where is set of examples that reach the nodes and Ŵ j is a subset of examples that have the jth output of the potential set outputs while R is the standard deviation. Due to branching procedure, data in child nodes have fewer R than parent nodes. A structure is selected that has the maximum expected error reduction by analyzing all possible structures. This dividing and conquering rule frequently produces a great tree-like structure that leads to overfit and to prevent overfitting, the overgrown tree is pruned, and pruned subtrees are substituted with linear regression functions in the second step. General form of the model is 77 : where a 0 , a 1 , a 2 are the linear regression constants. Figure 2c represents the basic structure of M5tree model.
The minimax probability machine regression (MPMR) model. The MPMR is a probabilistic, nonlinear regressor model which increase the least probability in the correct regression interval of the objective function. The MPMR is using convex optimizations and linear discriminant 79 , which make MPMR a good and improved version of Support Vector Machine 79 . The data is calculated among +δ and −δ with the axis of a dependent variable by shifting all of the regression data. The boundary between the two is a regression surface, where the upper and lower bounds of probability are identified for misclassifying a point without making distributional assumption 80 . The learning (D-dimensional) inputs are generated from undefined regression as follows: where a ∈ R D is an input vector according to a bounded distribution whereas Y ∈ R is an output vector, and variance ρ = σ 2 ∈ R . MPMR sets an approximation function f , where for x i generated from : The bounds are determined by model based on minimum probability ( ω ), that f (a) is within ε of Y 79 : By minimax probability presented in Eq. (13), the prediction power of a true regression is calculated by a bound-on minimax probability. Hence, deducing ω within ε of the true function 80 . The MPMR model is built based on kernel function, where K i,j = a i , a j is the kernel function based on Mercer condition, a i is from the learning data, χ i and ϕ are the output parameters. The schematic view of MPMR model is shown in Fig. 2d.
Multi-scale standardized precipitation index (SPI). The SPI quantifies the wet and dry scenarios based on statistical probability theory. Before developing the forecasting models, the multi-scale SPI index was calculated from rainfall (RnF) time-series 39 using a gamma distribution function ( g(RnF)): where α and β are the parameters determined by maximum likelihood estimator, and Ŵ(α) is the mathematical gamma function. The cumulative probability ( G(RnF) ) is defined as: www.nature.com/scientificreports/ By substituting t = RnF/β, Eq. (16), G(RnF) becomes: The cumulative probability reduces to the following form when RnF = 0: with p represents the probability of zero which determines the SPI index as: where ε 0 , ε 1 , ε 2 , ε 3 , ω 1 , ω 2 and ω 3 are arbitrary constants with magnitudes:
Step 2: Normalization process The normalization process is essential for data scaling to solve the problem of high variation in data. In this study, data were scaled between [0, 1] using Eq. (17): In Eq. (17), SPI represents the input/output, SPI min is the minimum SPI value, SPI max is the maximum SPI value and SPI norm is the value corresponding normalized numeric.
Step 3: Applying data intelligent methods.
The data were divided into 70-30% for the training and testing the models. The number of trees (1000) was defined before developing the RF model. Different activation functions (hardlim, radial basis, sine, sigmoid) were evaluated to determine the best activation function for various numbers of unseen neurons in the range from 1 to 50 before development of OSELM and ELM models. The size of the block was set to 100 for OSELM. The significant lags at (t-1) were used in M5tree and MPMR models to forecast SPI 1 , SPI 3 , SPI 6    www.nature.com/scientificreports/ www.nature.com/scientificreports/

Results and discussion
Drought prediction for Bangladesh was conducted in this study using the SPIs for different time-scales (1, 3, 6, 9 and 12 months) at four meteorological stations distributed over the country (e.g. Barisal, Bogra, Faridpur and Mymensingh). The prediction process was conducted using relatively new ML models such as RF, MPMR, M5tree, ELM and OSELM. To build the predictive models, the correlated antecedent SPI values were used as inputs.
The employed predictive models were trained using 50-year monthly data (1949-1997) while the testing was conducted using 16-year data (1998-2013) at all the four stations. To enhance the accuracy in prediction of SPI, all predictive variables were standardized in a range of 0 to 1. Adequacy of each predictive model was quantified using performance indices such as MSE, R, WI, NSE, RMSE, MAE and LM (Eqs. (18-24)). The predictive models were compared based on their performance during the testing phase.
The performance of the models in terms of statistical metrics is shown in Tables 1, 2 showed the lowest performance in predicting SPI for all the scales. Hence, it can be concluded that ELM is the best performing model while M5tree has the lowest accuracy in prediction of SPI. All predictive models showed better accuracy in predicting SPI for higher scales. This evidenced the potential of non-tuned extreme learning machine model in forecasting droughts in Bangladesh. Recently, the feasibility of the ELM model is successfully implemented for drought indices simulation in many other studies 44,86,87 .
The graphical evaluation and assessment among the predictive models in term of standardized performance indices are depicted in a form of Heatmap diagram in Fig. 5. The dark blue color in the figure represents the best statistical performance while dark red color represents the worst performance. It can be seen from the figure that ELM and OSELM showed the best performance in term of all metrics for all SPIs except SPI-1. Furthermore, ELM showed the highest performance compared to other models at all stations. Besides, the maximum number of dark red cells (worst predictive model) was shown by M5tree model. www.nature.com/scientificreports/ www.nature.com/scientificreports/ Taylor diagram is another graphical presentation which was used to make a comparison among the employed predictive models (Fig. 6). The results of Taylor diagrams indicated good consistency with the obtained performance indices. Figure 6 shows that the highest agreement exists between the RF prediction (blue rectangular) and observed SPI-1 at all the stations. For other SPI scales, ELM and OSELM showed relatively same results which indicate their superiority compared to other predictive models. These models provided the lowest normalized RMSE (less than 0.4), the highest correlations (more than 0.95) and the lowest variation (within 0.6-0.9). However, ELM provided better results for SPI-3 in Faridpur station compared to OSELM while at other stations and SPI scales the results were found the same.
Scatter plots shows the linear correlation between observed and predicted SPI values at all stations (Fig. 7). The results revealed that the prediction of all the predictive models except M5tree and MPMR have a high correlation with observation. Most of the predicted points are aligned to the perfect line (45° line) which shows a significant performance of prediction models. Based on the obtained values of correlation coefficients, it can be seen that OSELM ( R = 0.81−0.96 ) has better correlation for SPI-1 at all stations except Faridpur ( R = 0.74 ). However, it can be concluded that the best correlation for SPI-1 was attained using OSELM and the worst results by M5tree ( R = −0.006−0.011 ). For other SPI scales, it is clear that ELM provided a higher accuracy compared to other models. However, there was no significant difference between the results obtained using ELM ( R = 0.98−0.999 ) and OSELM ( R = 0.69−0.999 ) at all the considered stations. Therefore, both the ELM and OSELM models indicated a higher correlation between the observed and predicted SPI for different scales in comparison with RF, MPMR and M5tree models. The M5tree ( R = −0.006−0.011 ) provided the lowest correlation coefficient. Overall, it can be remarked that both ELM and OSELM models have adequate capability in SPI prediction.
To assess the uncertainty in SPI prediction, 25%, 50% and 75% quantile values of the observed and predicted SPI are presented using boxplots in Fig. 8. The figure shows that the variability in SPI-1 values could not be simulated by any of the models adequately. Many predicted SPI-1 values were found fluctuating near to zero (narrow range) while the observed values have a wide range [− 2 to 2]. However, the RF model showed better accuracy to simulate variability and quartiles of SPI-1 compared to others. All predictive models were found to show better results in simulating SPI quantiles of other SPI scales, especially for their higher orders. Overall, the ELM and OSELM provided the highest accuracy to simulate the variability of SPI values while M5tree showed the worst. Hence, it can be remarked that M5tree is not suitable for prediction of SPI in any regions of Bangladesh. www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/

Conclusions
The current research is attempted to investigate the feasibility of newly developed ML models to forecast multiple scales of SPI drought index over Bangladesh. The developed predictive models are inspected on the monthly scale of rainfall data for the period of 1949-2013 at four different meteorological stations. The predictors of the forecasting models were accomplished using the potential of the statistical auto-correlation method. The attained forecasting results demonstrate consistency in results obtained using ELM for the 3-, 6-and 12-month SPI. It The results indicate the potential of the models to be employed for drought forecasting in Bangladesh for the mitigation of drought impacts. In future, other ML models can be employed to evaluate their performance in forecasting droughts in Bangladesh. Besides, different optimization methods can be used for the optimization of ML model parameters to improve their prediction capability. www.nature.com/scientificreports/

Data availability
The used dataset in this research are available upon request from the corresponding author.