Suspended sediment load prediction using long short-term memory neural network

Rivers carry suspended sediments along with their flow. These sediments deposit at different places depending on the discharge and course of the river. However, the deposition of these sediments impacts environmental health, agricultural activities, and portable water sources. Deposition of suspended sediments reduces the flow area, thus affecting the movement of aquatic lives and ultimately leading to the change of river course. Thus, the data of suspended sediments and their variation is crucial information for various authorities. Various authorities require the forecasted data of suspended sediments in the river to operate various hydraulic structures properly. Usually, the prediction of suspended sediment concentration (SSC) is challenging due to various factors, including site-related data, site-related modelling, lack of multiple observed factors used for prediction, and pattern complexity.Therefore, to address previous problems, this study proposes a Long Short Term Memory model to predict suspended sediments in Malaysia's Johor River utilizing only one observed factor, including discharge data. The data was collected for the period of 1988–1998. Four different models were tested, in this study, for the prediction of suspended sediments, which are: ElasticNet Linear Regression (L.R.), Multi-Layer Perceptron (MLP) neural network, Extreme Gradient Boosting, and Long Short-Term Memory. Predictions were analysed based on four different scenarios such as daily, weekly, 10-daily, and monthly. Performance evaluation stated that Long Short-Term Memory outperformed other models with the regression values of 92.01%, 96.56%, 96.71%, and 99.45% daily, weekly, 10-days, and monthly scenarios, respectively.

www.nature.com/scientificreports/ performance indicators, it was concluded that the DENFIS model showed improved accuracy compared to the MARS and ANFIS-FCM models in predicting SSL. Zounemat 21 performed a study on the San Joaquin River, United States, regarding the forecasting of suspended sediment concentration using two ANN models hybridized with the Levenberg-Marquardt (L.M.) algorithm and PSO, which are called ANN-LM and ANN-PSO, respectively. A standalone ANFIS model was also developed. The study finds that ANN-PSO and ANFIS were superior in predicting daily suspended sediment concentration values. Hybrid ML models also have limitations that need to be considered in solving problems such as SSL prediction. Qian et al. 17 stated that the training time of hybrid ML models is high, especially when dealing with complex problems. Hybrid ML models require many more input parameters to be considered during training compared to standalone ML models. This often restricts the development and usage of hybrid ML models 17 . In addition, complicated architecture and an unknown optimal number of clusters have also been reported as disadvantages of utilizing hybrid ML models 15 .
One type of ML algorithm, which has not been explored much in the context of SSL prediction, is the convolutional neural network (CNN). This neural network, which is a kind of deep learning algorithm, has shown plenty of promise in other fields based on previous literature reviews. Kabir et al. 22 developed a CNN to predict flood depths in Carlisle, United Kingdom. The CNN model developed in this study was trained with outputs provided by a two-dimensional (2D) hydraulic model. The CNN model's performance was compared to that of a support vector regression (SVR) model. This study determined that the proposed CNN model was far superior to the SVR in predicting flood depths, as indicated by several adopted performance measures.
Haurum et al. 23 investigated the usage of CNN in estimating the water levels in sewer pipes in Denmark. Models based on the decision tree algorithm were also trained and tested for performance comparison with the CNN model. The estimation problem in this study is treated as a classification and regression problem. This study demonstrates that the CNN models outperform the decision tree models, in the context of estimating water levels.
Huang et al. 24 utilized a CNN trained using a robust loss function to forecast the river flow in four rivers in the United Kingdom. The performance of the CNN model trained using a robust loss function is compared with benchmark models based on several algorithms, namely autoregression (A.R.), radial basis function neural network (RBFNN), MLP, kernel ridge regression (KRR), and a generic CNN. This study shows that the CNN trained using a robust loss function produces the best forecasting performance.
Ni and Ma 25 researched the applicability of implementing a model based on CNN to predict the generation of power from a marine wave energy converter (WEC) system through the utilization of a double buoy oscillating device (OBD). A multi-input approach was used to train and test the CNN. The study concludes that the proposed CNN model performs better than the ANN and regression models in the prediction of marine wave power generation.
Zhu et al. 26 studied the utilization of CNN in developing a model to predict the generation of wind power. Wind power historical data obtained from a wind farm is fed to the CNN model as input to predict wind power generation 4 h ahead. This study, which is the first to use CNN to predict wind power generation, proves that CNN is indeed feasible for application in regression prediction in order to predict wind power generation.
There are many advantages in utilizing CNNs to solve problems in engineering-related and non-engineeringrelated fields. Among the primary benefits of using CNNs is that they are machines that learn end-to-end, with images of input mapped directly to the target bounding box coordinates or goal labels 27 . This direct mapping ability means that the design of suboptimal handcrafted features, also known as feature engineering (F.E.), is a time-consuming process and may cause image representation to be noisy with suboptimal discriminative power, which is no longer needed 27,28 . CNN's are also robust and rugged to challenging situations such as distortion in images, which are commonly caused by shape change due to camera lens, varying lighting conditions, partial occlusions presence, varying poses, and horizontal and vertical shifts 28,29 . In addition, with the same coefficients used throughout different locations within the space of the convolutional layer, the memory requirement is significantly reduced for CNNs 29 . The training time of CNNs is also reduced, as the number of parameters is substantially reduced, making training more manageable and better 29 while also making processing faster 30 .

Problem statement
The usage of sub-optimal models or methods in measuring, calculating, and predicting SSL is costly in terms of time, funding, energy, and workforce 2 . The sediment rating curve (SRC), which utilizes a regression analysis to establish a relationship between sediments and river discharges, is a conventional and standard means of predicting SSL 12,14 . However, it has been found to be incapable of providing sufficiently accurate predictions, as the procedure of utilizing sediment loads versus stream discharge has been shown to be inaccurate 14 . Because of this, researchers have turned themselves in the direction of artificial intelligence (A.I.) and its subset, which is machine learning (ML). ML is able to identify trends and patterns with ease; operate automatically; continuously improve; and handle data of multi-dimensions and multi-variance, which makes it especially useful in utilizing large amounts of data to predict SSL.
Traditional ML methods that were previously mentioned in the literature review depended on feature engineering to select features manually before the prediction stage. If the features were not selected carefully, the prediction performance would degrade. Moreover, the selection of hyperparameters is critical and has an enormous impact on prediction performance. Additionally, conventional ML methods were found to degrade the performance of the patterns in the data are complex. More advanced automatic learning methods such as deep learning models were demonstrated to learn this type of complex pattern.
Deep learning models such as CNNs, which were demonstrated in SSL application, focused only on spatial features to extract features related to current input and ignore other features available in the previous time steps. Therefore, the recurrent neural network was the key solution to automatically extracting the temporal features

Objectives
In this research work, we utilized LSTM, which was found to solve the vanishing problem 31 and improve performance by considering a large number of sediment and discharge values collected from previous days, weeks, 10-days, and months. The purpose of this study is to explore the capability and demonstrate the effectiveness of a model based on long short-term memory (LSTM) neural networks in predicting suspended sediment load (SSL) in the Johor River basin, given a time series of historical data relating to suspended sediment and river streamflow. The observed and predicted SSL values are inspected comprehensively through statistical analyses. After predicting SSL, the performance of the LSTM model is examined and evaluated using several selected performance indicators to determine the efficacy of LSTM in the field of SSL prediction.

Methodology
Study area. Located in Southeast Asia, Malaysia is primarily made up of two land regions: Peninsular Malaysia and the Borneo Islands, which consists of the states Sabah and Sarawak. The air in Malaysia is generally moist and cloud-covered, as the sea surrounds the country. The country is also situated near the equator; hence it receives higher concentrations of sunlight, as rays from the Sun almost entirely strike throughout the year. The case study area, Johor, is situated in the southern parts of Peninsular Malaysia. Johor is officially segregated into a total of eight districts, with the capital of the state being Johor Bahru which is highly urbanized as it serves as a port of entry connecting countries Malaysia and Singapore while also acting as an international business hub. The district of Kota Tinggi, which has an area of 3644 km 2 and is based approximately 42 km north-east of Johor Bahru, has rapidly developed as part of the growth corridor of Johor due to its close proximity to the Johor capital. Kota Tinggi, located at East Johor with 10 sub-districts, has the sea encompassing 65% of its border [32][33][34] .
This study uses the Johor River basin as the case study area, as illustrated in Fig. 1. The Johor River basin comprises of approximately 2286 km 2 of the total catchment area and has a total length of about 122.7 km. The Johor River's headwater originates from the slopes of east Kluang and Gunung Belumut, which then moves south and discharges into the Straits of Johor. The Johor River has two major tributaries, which are the Sayong River and the Linggiu River. The streamflow station at Rantau Panjang (1,737,551, CA = 1130), as can be seen in Fig. 1, is among the main hydrometric stations and is situated downriver of a significant township. There are two gauges along the Rantau Panjang stream. One gives sediment measurement (No. 1737551), and the other measures river flow discharge (No. 1737451). Measurements of these two parameters are obtained on a daily basis from 1988 to 1998. The sediment and river flow discharge measurements have been utilized in this study. The data is illustrated in the sediment vs. time and discharge vs. time scatter plots in Figs. 2 and 3, respectively, while descriptive analyses of the sediment and discharge data can be seen in Table 1. Because of river water quality is a concern, the proposed research is essential. Urbanization and land-use practices have complicated the situation for the investigated study area 36,37 . Therefore, this investigating a reliable tool to predict the sedimentation with high precision for better surface water management.
Input sensitivity. One of the main tasks in machine learning is to choose input variables that have an impact on the output. A good understanding of the underlying process and statistical analysis of inputs and outputs are required to find a suitable model that links the inputs with the outputs. Usually, sediment is affected by the discharge and history its values and history of observed sediment values. There are three scenarios:   www.nature.com/scientificreports/ (1) A history of the discharge as input for forecasting future sediment in scenario (1) in Eq. (1): (2) A history of sediment as input for forecasting future sediment in scenario (2) in Eq. (2): (3) a history of sediment and discharge as inputs for forecasting future sediment in scenario (3) in Eq. (3): where SED t is the sediment at time t, DIS t is discharge at time t, n is a one (day, week, 10-days, month) ahead value until the seven (day, week, 10-days, month) ahead.
Data partitioning. This section describes the experimental protocol and data partitioning process. Our dataset contains three sets of training, validation, and testing. The training set was employed to train the models, learn the patterns from the input, and tune their weights. At the same time, the validation set was used in the training stage to overcome the overfitting problem. On the other hand, a testing set was utilized to evaluate the models and calculate the performance metrics. The dataset was divided into two sets: training and testing with the rule 80/20. In this splitting, 80% of data, including the first years of our dataset, was assigned to the training set, while 20% of data, including the last years, was assigned to the testing set. After that, the training set was divided again with the same rule of 80/20 to get the final training set and validation set. This splitting between training and validation was done five times randomly using the 5-cross validation technique to select the best model between five models that produces the best evaluation metrics with the testing data.
Models used for forecasting. In this paper, four models were used for forecasting, which include Elas-ticNet Linear Regression (ElasticNet LR) [38][39][40] , Multilayer Perceptron Neural Network (MLP NN) 41 , Extreme Gradient Boosting (XGB) 42 , and Long Short-Term Memory (LSTM) 43 . Many methods were used to predict SSL in the literature review. The methods were divided into conventional ML and deep learning methods. In this study, we selected various baseline of ML methods to compare with our proposed LSTM. The selection was made considering various model structures and learning mechanisms in the models to get a fair comparison.
The training and testing for the LSTM model were carried out by using the TensorFlow framework on an NVIDIA GeForce GTX 1080 Ti GPU.
ElasticNet linear regression. ElasticNet LR is a regularization linear regression technique that is usually used to reduce overfitting in linear model 39 . Linear Regression is a well-known regression method, but this version was improved by adding regularization terms to loss function to improve model's predictions 45,46 . ElasticNet LR penalizes the least-squares method using an elastic net penalty. It combines two popular penalty functions, namely L1 and L2, to the loss function during training 39 . It was found to overcome the limitations of the lasso technique. ElasticNet is a hybrid of Lasso and Ridge Regression techniques and has the advantage of trading-off between Lasso and Ridge.
β are optimal weights to minimize the loss function which is represented by the squared difference between the actual and forecasted output with two regularization terms added. These terms are L2 penalty 2 β 2 and L1 penalty 1 β 1 with two parameters 1 and 2 to be tuned. The values of parameters 1 and 2 should be selected carefully to improve the prediction performance. Various values of parameters 1 and 2 were evaluated to find optimal values as shown in the "Results and discussion" section.
MLP neural network. MLP neural network is a network with several layers, and nonlinear activation functions 38,47 . The parameters of this network are tuned iteratively (800 iterations) to find optimal ones. Several hyperparameters were tested to find the best ones. These hyperparameters are: 1. activation: logistic, tanh, or relu. 2. solver : lbfgs, adam, or sgd. 3. learning_rate 48 : constant, invscaling, or adaptive.
The neural network architecture is defined by the number of hidden layers, the number of nodes in each hidden layer, and the type of activation function 49,50 . In this study, different MLP NN architectures were evaluated by changing the number of hidden layers and the number of nodes in layers. The final best architecture that gave the best metrics in terms of R 2 , MAE, RMSE was as follows: • the input layer with the number of nodes equals historical values of sediment and discharge • the output layer, which has one node for sediment forecasting • one hidden layer with 100 nodes. www.nature.com/scientificreports/ Extreme gradient boosting (XGB). XGB is an end-to-end tree learning system. It runs more than ten times faster than existing solutions on a single machine and scales to a large number of examples in memory-limited resources 38,51 . Various algorithmic optimizations are behind the scalability of XGB. It uses a gradient descent algorithm to minimize the loss and a regularization technique to control the over-fitting 42 .
Long short-term memory. LSTM is a special type of Recurrent Neural Network (RNN) that is used for longrange sequence modeling 39,52 . LSTM has a memory cell, as shown in Fig. 4, which acts as an accumulator of state information, supported by control gates. The advantage of this structure is that it speeds down the gradient vanishing. LSTM network was found to capture temporal correlations 53 .
In this study, a series of discharge and sediment observations were applied to LSTM. The parameters of LSTM were tuned iteratively to fit the data.
To validate the optimal structure of LSTM, various architectures, including a number of LSTM layers, number of nodes in each layer, number of fully connected layers, types of activation function, and number of dropout layers, were tested and evaluated to select the best architecture that gives the best evaluation metrics. The best architecture of the proposed LSTM model consists of the following layers: Various hyperparameters, including learning rate, loss function, optimizer, percentage of dropout, batch size, and a number of epochs, were tested and evaluated to select the optimal hyperparameters that give the best evaluation metrics. The final hyperparameters were as follows: 1. The learning rate used to train the LSTM model was set to 0.002 to balance the speed of learning (done in 400 epochs) and avoidance of undesirable divergence. 2. The batch size was set to 8 to balance the speed of convergence and good performance. 3. The number of epochs was set to 400. 4. MAE and MSE loss functions were evaluated. It was found that the MAE loss function can be minimized better using the Adam optimizer.
In summary, models described earlier were utilized for forecasting the sediment using traditional machine learning methods as baseline models and LSTM as the proposed model. The behavior of sediment is affected by different factors such as the history of discharge and sediment.
Performance evaluation. In this section, five standard evaluation metrics such as coefficient of determination (R2), mean absolute error (MSE), root mean square error (RMSE), relative absolute error (RAE), and relative squared error (RSE) were utilized. The larger value of R 2 refers to the better prediction performance of the model. However, R 2 is not enough to determine whether the coefficient prediction is biased or not. Therefore, to further investigate if a regression model provides a good fit to our data, other error metrics were used, such as RAE, RSE, MAE, and RMSE, to find the error or difference between the actual and predicted outcome. The smaller value of RSE, RAE, MAE., and RMSE refers to the model's better prediction performance. The drawback of RMSE is that it is more sensitive to big errors and outliers than MAE. On the other hand, RSE was found to solve the RMSE drawback of sensitivity to the mean and scale of predictions. In addition, we used absolute error This section describes the performance indicators as follows: (1) Coefficient of determination (R 2 ) represents a statistical measure to study the correlation (trend) between the actual and the forecasted output. R 2 = 0 means the model is random. R 2 = 1 means that the model fits data perfectly.
(2) Mean absolute error (MAE): it represents the absolute error between the actual and the forecasted output.
(3) Root Mean Square Error (RMSE): it represents the root of average squared error between the actual and the forecasted output.
(4) Relative absolute error (RAE): it stands for a normalized sum of absolute differences between the actual and the forecasted outputs. where n is the number of samples, y is actual output, ŷ is a is forecasted output, y is an average of actual output.

Autocorrelation function (ACF).
ACF is an effective analytical tool used with time series forecasting and analysis 55 . This function aims to measure the statistical relationships between observations in a single data series. In other words, ACF defines the relation between the current and past values of the observation. Additionally, it finds the correlations taking into account components like trends, seasonality, cyclic, and residual. Figures 5, 6, 7 and 8 show four scenarios of historical patterns in ACF, including daily, weekly, 10-days, and monthly. The correlogram shows that the data have seasonal dependencies and the same pattern over the years. For daily, the analysis took historical data from the year 1988 until the year 1998. Figure 5 shows data observed for www.nature.com/scientificreports/   www.nature.com/scientificreports/ 50 days starting from January 1988. Figure 6 illustrates 50 lags of weeks, where each lag represents one week. Meanwhile, a 10-days correlogram is shown in Fig. 7 for 50 lags of 10 days. In other words, each lag represents 10 days. When the lag gives a high value of sediment above the upper line, it means that the output at this lag has a high correlation. Figure 8 shows 50 lags of monthly sediment for 50 months.
To summarize the training and performance evaluation process a flow chart has been developed, which is illustrated in Fig. 9. The flow chart shows the step-by-step process followed in the methodology.

Results and discussion
The first experiment was aimed to forecast the sediment for one day ahead using four machine learning models, including ElasticNet Linear Regression (ElasticNet LR) 38 , Multilayer Perceptron Neural Network (MLP NN), Extreme Gradient Boosting (XGB) 47 , and Long Short-Term Memory (LSTM). To evaluate the performance of the proposed models, Coefficient of Determination (R 2 ), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), and Relative Squared Error (RSE),) were used. The hyperparameters of the models were tuned to optimize the models to give the best results. Table 2 and Fig. 10 summarize each model's performance metrics for four scenarios, including daily, weekly, 10-days, and monthly scenarios. In these four scenarios. The data were divided into daily (values of discharge and sediment for each day), weekly (average values of discharge and sediment for each week), 10-days (average values of discharge and sediment for each 10-days), and monthly (average values of discharge and sediment for each month). The objective of demonstrating these four scenarios is to study the data variation and explore the hidden patterns that the model should be able to learn. It was found that monthly data in the monthly scenario have patterns that can be learned well and generalized to future examples to predict future SSL. In this experiment, LSTM outperformed other baseline solutions such as ElasticNet LR, MLP NN, and XGB in all scenarios. In the monthly scenario, even the dataset used for training has a small size, LSTM, a data-hungry deep learning model, could compete ElasticNet www.nature.com/scientificreports/ www.nature.com/scientificreports/ LR. LSTM obtained R 2 of 92.01%, 96.56%, 96.71%, and 99.45% in daily, weekly, 10-days, and monthly scenarios, respectively. The learning curves of LSTM for monthly and weekly scenarios were shown in Fig. 11. The objective of utilizing four scenarios is to explore various patterns from the data. In the daily scenario, our used data are suffering from big variations and noise, which makes the process of pattern learning more complex compared to other scenarios. In weekly and 10-days scenarios, the variation in data was reduced, making the learning of pattern more efficient, and the performance metrics were improved compared to the daily scenario. Lastly, in the monthly scenario, the learning model's performance got remarkable improvement in terms of R2, RSE, RAE, RMSE, and MAE, as shown in Table 2 and Fig. 10.
The second experiment was aimed to study all possible combinations of inputs to investigate and analyze the input sensitivity on SSL prediction, which is a significant stage in building a prediction model. The evaluation was done using evaluation metrics. Various inputs of discharge and sediment were selected to find the optimal combination that gives the best Coefficient of Determination in each model and each scenario. Table 3 summarizes the R 2 for each model for four scenarios, including daily, weekly, 10-days, and monthly scenarios. The best obtained R 2 values were shown in bold font in Table 3. It was found that combining the history of inputs of both discharge and sediment outperformed taking the only history of discharge or sediment to predict the future sediment in almost all scenarios. Additionally, ignoring the input discharge and using only previous sediment values to predict future sediment is only acceptable in daily scenarios and produces bad forecasting results in other scenarios. Moreover, it was found that using only discharge as input at time t to predict sediment at time t is possible by using the LSTM model, which obtained R 2 of 84.24%. 96.52%, 96.48%, and 98.6% in daily, weekly, 10-days, and monthly respectively. On the other hand, ElasticNet was not able to provide good R 2 , MAE, and RMSE. Figure 12 illustrates the R 2 , MAE, and RMSE for Elastic Net L.R. and LSTM.
The third experiment aimed to demonstrate the model's ability to do forecasting multiple time steps ahead, such as few days ahead and few months ahead. This experiment is essential to study the proposed model's generalization capability to learn new patterns from new future cases. The comparison between LSTM and ElasticNet Figure 11. Learning curves for LSTM in weekly (on the right) and monthly (on the left) scenarios. Table 3. Sensitivity analysis of inputs for four scenarios: daily, weekly, 10-days, and monthly and four models: Elastic Net, MLP, XGB, and LSTM. www.nature.com/scientificreports/ LR (with and without normalization) for four scenarios in terms of R 2 for one to seven-time steps ahead was made. The time step includes day, week, 10-days, or month. Table 4 shows the R 2 for LSTM and ElasticNet LR. The best obtained R 2 values were shown in bold font. It can be seen clearly in scenario one that LSTM has outstanding performance in predicting the sediment for one day and seven days ahead. In addition, LSTM outperformed www.nature.com/scientificreports/ ElasticNet LR and produced better forecasting results in all scenarios. Figure 13 illustrates the R 2 for Elastic Net L.R. and LSTM in various time steps ahead. SED + 1 for all scenarios shows that the model fits the data better than the next week of sediment (SED + 2) until SED + 7. The results indicate that LSTM can be used to predict the expected changes in sediment one week ahead.  Fig. 15. The figure illustrates the scatter plot of actual sediment versus forecasted sediment for four scenarios: daily, weekly, 10 days, and monthly. In addition, the signals of actual and forecasted sediment were shown in this figure. It was evident that there is a big match between the actual and forecasted sediment in all scenarios and specifically the monthly one.

Inputs
Having two parameters result in three scenarios for input sensitivity study. These three scenarios aim to study the impact of inputs on the learning model. In this experiment, it was found that using both history of discharge and sediment can improve the evaluation metrics compared to using only one input. Additionally, discharge input with its histories plays a significant role in predicting SSL. On the contrary, utilizing only previous historical data of sediment is not enough and not efficient to predict future sediment. In Figs. 16 and 17, using both sediment and discharge as inputs leads to better matching between actual sediment and forecasted sediment than using sediment discharge only. For the weekly scenario, using sediment only gives the worst fit of the two models. On the other hand, using discharge only in both models can give a good fit but less than using both sediment and discharge. Table 5 shows the impact of two factors: alpha and in ElasticNet LR. The coefficient of determination was calculated for various values of alpha and l 1 _ratio. It was found that when alpha = 0.1 and l1_ratio = 1, R 2 has the best value of 94.09%.

Conclusion
This study proposes an LSTM model for the prediction of suspended sediment in the Johor river in Malaysia. The prediction model was trained on the daily sediment and daily discharge data. The model was trained and validated on 80% of the data and tested on the remaining 20% of the data. Four different models were analysed for suspended sediment prediction, such as ElasticNet Linear Regression, MLP neural network, Extreme Gradient Boosting and Long Short-Term Memory. These models were trained on four different scenarios: daily, weekly, 10-daily, and monthly. This study was divided into three experiments. The first experiment was for the development of the LSTM model for one day ahead prediction of suspended sediments. The results of experiment one showed that LSTM outperformed other models with the regression values as 92.01%, 96.56%, 96.71%, and 99.45% in daily, weekly, 10-days, and monthly scenarios, respectively. The second experiment was for sensitivity analysis of the inputs. The second experiment results showed that the LSTM model performs best when discharge at time t is used as an input for predicting sediment at time t, with regression values obtained as 84.24%. 96.52%, 96.48%, and 98.6% for daily, weekly, 10-days, and monthly scenarios, respectively. The third experiment compared LSTM and ElasticNet LR (with and without normalization) for four scenarios in terms of regression values for one to seven-time steps ahead. The third experiment's outcome was that the LSTM model has outstanding performance in predicting the sediment for one day and seven days ahead. In summary, using LSTM has improved the evaluation metrics by obtaining an increase in the coefficient of determination R 2 and a decrease in RAE, RSE, RMSE, and MAE. This study's limitation is related to the size of data collected for the period of 1988 to 1998. Collecting more training data can improve data-hungry models of deep learning by learning new patterns from new samples. Therefore, we intend to enhance the future results by retraining the proposed LSTM with future collected data. Furthermore, a combination of 1D convolutional layers with LSTM can be investigated in future work to combine both spatial and temporal features to enhance the prediction. www.nature.com/scientificreports/

Data availability
The data that support the findings of this study are available Department of Environment Malaysia (DOE).