Prediction of nickel concentration in peri-urban and urban soils using hybridized empirical bayesian kriging and support vector machine regression

Soil pollution is a big issue caused by anthropogenic activities. The spatial distribution of potentially toxic elements (PTEs) varies in most urban and peri-urban areas. As a result, spatially predicting the PTEs content in such soil is difficult. A total number of 115 samples were obtained from Frydek Mistek in the Czech Republic. Calcium (Ca), magnesium (Mg), potassium (K), and nickel (Ni) concentrations were determined using Inductively Coupled Plasma Optical Emission Spectroscopy. The response variable was Ni, while the predictors were Ca, Mg, and K. The correlation matrix between the response variable and the predictors revealed a satisfactory correlation between the elements. The prediction results indicated that support vector machine regression (SVMR) performed well, although its estimated root mean square error (RMSE) (235.974 mg/kg) and mean absolute error (MAE) (166.946 mg/kg) were higher when compared with the other methods applied. The hybridized model of empirical bayesian kriging-multiple linear regression (EBK-MLR) performed poorly, as evidenced by a coefficient of determination value of less than 0.1. The empirical bayesian kriging-support vector machine regression (EBK-SVMR) model was the optimal model, with low RMSE (95.479 mg/kg) and MAE (77.368 mg/kg) values and a high coefficient of determination (R2 = 0.637). EBK-SVMR modelling technique output was visualized using a self-organizing map. The clustered neurons of the hybridized model CakMg-EBK-SVMR component plane showed a diverse colour pattern predicting the concentration of Ni in the urban and peri-urban soil. The results proved that combining EBK and SVMR is an effective technique for predicting Ni concentrations in urban and peri-urban soil.

According to Sergeev et al., combining various modelling techniques has the potential to eliminate flaws and increase the efficiency of the hybrid model produced over the single models from which it was developed. Against this backdrop, this new paper deems it necessary to apply a combined algorithm from geostatistic and MLA to create the best-hybridized model to predict the enrichment of Ni in the urban and peri-urban area. This research will lean on empirical Bayesian kriging (EBK) as the base model and hybridize it with a support vector machine (SVM) as well as multiple linear regression (MLR) model. The hybridization of EBK with any MLA is uncharted. The plurality of hybrid models seen is a combination of ordinary, residual, regression kriging and MLA. EBK is a geostatistical interpolation approach that utilizes a spatial stochastic process that is localized as a non-stationary/stationary random field with a defined localize parameter on the field that allows for space variation 39 . EBK has been applied in a variety of studies, including the analysis of the distribution of organic carbon in agrogray soils 40 , soil contamination assessment 41 and mapping soil properties 42 .
On the other hand, a self-organising map (SeOM) is a learning algorithm that has been applied in various articles such as Li et al. 43 , Wang et al. 44 , Hossain Bhuiyan et al. 45 , and Kebonye et al. 46 to determine the spatial attributes and grouping of elements. Wang et al. 44 outlined that SeOM is a vigorous learning technique known for its capacity in grouping and imagining that is allowed to deal with nonlinear problems. SeOM, unlike other pattern recognition techniques such as principal component analysis, fuzzy clustering, hierarchical clustering and multiple criteria decision making, performs better in an organization and recognising the pattern of PTEs. According to Wang et al. 44 , SeOM can spatially group the distribution of related neurons and provide highresolution data visualization. SeOM will visualize Ni prediction data for the best model developed to characterize the results for straightforward interpretation.
This paper intends to generate a robust mapping model with optimal accuracy that predicts Ni content in urban and peri-urban soil. We hypothesized that the dependability of the hybridized model primarily relies on the influence of the other model attached to the base model. We acknowledge the challenges in DSM, and while these challenges are being addressed on multiple fronts, the combination of geostatistics and MLA model progression appears to be gradual; therefore, we will attempt to answer the research question that may generate a hybrid model. Nevertheless, how accurate is the model in predicting the targeted element? Furthermore, what is the efficiency assessment level based on validation and accuracy assessment? Therefore, the specific objectives of this research are (a) to create a combined hybrid model using EBK as the base model against SVMR or MLR, (b) compare the models generated (c) propose the best hybrid model to predict the concentration of Ni in urban or peri-urban soil and (d) to apply SeOM to create high-resolution spatial variability maps of Nickel.

Materials and methods
Study area. The research is being conducted in the Czech Republic, specifically in the Frydek Mistek district of the Moravian-Silesian Region (see Fig. 1). The study area's geography is a very rugged landscape that is mostly part of the Moravian-Silesian Beskydy region, which is part of the outer Carpathian Mountain range. The study area falls within latitude 49° 41′ 0′ North and longitude 18° 20′ 0′ East at an altitude varying between 225 and 327 m above sea level; however, the Koppen classification system of the area's climatic situation is rated as Cfb = temperate oceanic climate with a high amount of rainfall even in dry months. Temperatures vary slightly between − 5 °C and 24 °C throughout the year and are seldom below − 14 °C or above 30 °C, whereas average annual precipitation is between 685 and 752 mm 47 . The district's area survey is projected to be 1208 km 2 , with 39.38% of the land under cultivation and 49.36% under forest cover. The area used for this study, on the other hand, is approximately 889.8 km 2 . In and around the Ostrava neighbourhood, the steel industry and metal works are active. Metal works, steel industry that uses Ni for stainless steel (e.g., resisting corrosion from the atmosphere) and alloy steel (nickel can increase the strength of the alloy while maintaining its good plasticity and toughness), and intensive agriculture such as phosphate fertilizer application and livestock production are potential sources of Ni in the study area (e.g., Ni supplement in sheep lamb to increase growth rate in lambs and cattle fed low). Other industrial uses of Ni in the research area include its usage in electroplating, which consists of the electroplated nickel and electroless nickel processes. The soil properties are easily differentiated from the soil's colour, structure, and carbonate content. The soil's texture is medium to fine, and it is derived from parent materials. They are either colluvial, alluvial, or aeolian in nature. Some soil areas show mottles in the top and subsoil, which are usually accompanied by concrete and bleaching. However, cambisols and stagnosols are the most common soil types in the area 48  Soil sampling and analysis. Topsoil samples totaling 115 were obtained from urban and peri-urban soil in the Frydek Mistek district. The sample pattern used was the regular grid, and the soil sample intervals were 2 × 2 km using a handheld GPS device (Leica Zeno 5 GPS) at a depth of 0 to 20 cm for topsoil. The samples were wrapped in Ziploc bags, labelled appropriately, and transported to the laboratory. The samples were air-dried to produce a pulverized sample, crushed by a mechanical system (Fritsch disk mill), and sieved (sieve size 2 mm). A gram of the dried, homogenized, and sieved soil sample was placed in a Teflon bottle that was clearly labelled. In each Teflon container, 7 ml of 35% HCl and 3 ml of 65% HNO 3 were dispensed (using automatic dispensers-one for each acid), and the cap was gently closed to allow the sample to remain overnight for reactions (aqua regia procedure). The supernatant was put on a hot metal plate (temperature: 100 Watt and 160 °C) for 2 h to promote the digestion process of the sample before being allowed to cool. The supernatant was transferred to a 50 ml volumetric flask and diluted to 50 ml with deionized water. After that, the diluted supernatant was filtered into 50 ml PVC tubes with deionized water. In addition, 1 ml of the diluted solution was diluted with 9 ml of deionized water and filtered into a 12 ml test tube prepared for PTE pseudo-concentration in this study. www.nature.com/scientificreports/ coupled plasma optical emission spectrometry) (Thermo Fisher Scientific, USA) following standard methods and protocols. The quality assurance and control (QA/QC) procedure was ensured (SRM NIST 2711a Montana II soil). PTEs having detection limits of less than half were excluded from this study. The PTE used in this study had a detection limit of 0.0004. (Ni). Furthermore, the quality control and quality assurance process for each analysis was ensured by analyzing the reference standards. To ensure that the error was minimized, a double analysis was performed.
Empirical Bayesian kriging. Empirical Bayesian kriging (EBK) is one of the numerous geostatistical interpolation techniques used in modelling in diverse fields such as soil science. Unlike the other kriging interpolation techniques, EBK varies from conventional kriging methods by considering the error of the semivariogram model estimation 50 . In EBK interpolation, several semivariogram models, are calculated during the interpolation instead of a unitary semivariogram. The interpolation technique makes way for uncertainties associated with this plotting semivariogram and programming the highly complex parts of composing a sufficient kriging approach 40 . The interpolation process of EBK follows three criteria as proposed by Krivoruchko 50   www.nature.com/scientificreports/ Support vector machine regression. Support vector machine is a machine learning algorithm that generates an optimal disengaging hyperplane to differentiate identical but not linearly independent categories. Vapnik 51 , created the algorithm for intent classification, but it has recently been used to solve regression-oriented problems. According to Li et al. 52 , SVM is one of the best classifier techniques and has been used in various fields. The regression component of SVM is used in this analysis (support vector machine regression-SVMR). Cherkassky and Mulier 53 , pioneered SVMR as a regression based on the kernel, and its computation was performed using a linear regression model with a multinational space function. John et al. 54 reported that the SVMR modelling employs a hyperplane linear regression, which creates a nonlinear relationship and allows for the space function. According to Vohland et al. 55 epsilon (ε)-SVMR uses a trained dataset to obtain a represented model as an epsilon -insensitive function applied to map data independently with the optimum epsilon-deviation from dependent data training. The preset distance error within is ignored from the actual value, and if the error is larger than the epsilon (ε), the soil property compensates for it. The model also reduces the intricacy of training data to a broader subset of support vectors. The equation as proposed by Vapnik 51 is given as.
In which the b represents the scalar threshold, K x , x k representing the kernel function, α denoting the Lagrange multiplier, N symbolizing the number dataset, x k representing the data input, and y is the data output. One of the critical kernels used is the SVMR operation with is the gaussian radial basis function (RBF). The RBF kernel was applied to ascertain the optimum SVMR model essential to procure the most nuanced penalty set factors C and the kernel parameters gamma (γ) for the PTE training data. First, we assessed the set of training and then tested the validation set's model performance. The turning parameter used was sigma and the method value was svmRadial.

Multiple linear regression. The multiple Linear Regression Model (MLR) is a regression model that
embodies the relationship between a response variable and numerous predictor variables by employing linearly incorporated parameters that are computed using the least-squares method. In MLR, the least square model is a prediction function that is directed toward a soil property following the selection of an explanatory variable. It was necessary to use the response in building a linear relationship using the explanatory variable. The PTE was used as the response variable which was used to establish the linear relationship utilizing the explanatory variable. The MLR equation is given as In which the y represents the response variable, a denotes the intercept, n signifies the number of predictors, b 1 denotes the partial regression of coefficient, x i implies the predictors or the explanatory variables and the ε i signifies the error in the model, which is also called residual.
The model was utilized in RStudio.
Hybrid modelling. The hybrid models were obtained by sandwiching the EBK as the base model with SVMR and MLR. This was done by extracting predicted values from the EBK interpolation. The predicted values obtained from interpolated Ca, K and Mg were passed through a combination process to obtain new variables such as CaK, CaMg and KMg. The elements Ca, K and Mg were then combined to obtain the fourth variable, CaKMg. Overall, the variables obtained were Ca, K, Mg, CaK, CaMg, KMg and CaKMg. These variables became our predictors that will aid in predicting Nickel concentration in urban and peri-urban soil. The predictors were subjected to an SVMR algorithm to obtain a hybrid model Empirical bayesian kriging-Support vector machine (EBK_SVM). Similarly, the variables were piped through MLR algorithm likewise to obtain a hybrid model Empirical bayesian kriging -multiple linear regression (EBK_MLR). Generally, the variables Ca, K, Mg, CaK, CaMg, KMg and CaKMg were used as covariates which served as predictors in predicting the Ni content in urban and peri-urban soil. The most acceptable model (EBK_SVM or EBK_MLR) obtained will then be visualized using the self-organizing map. The workflow of the study is presented in Fig. 2.
Self-organizing maps (SeOM). Using SeOM has become a popular tool in a variety of sectors for the organization, appraisal, and prediction of data in the financial sector, medical sector, industrial sector, statistics, soil science, and so on. SeOM was created using an artificial neural network for organization, evaluation, and prediction, as well as unsupervised learning approaches. In this study, SeOM was used to visualize the concentration of Ni based on the finest model used to predict Ni in urban and peri-urban soil. The data treated in the SeOM assessment serves as an n input dimensional vector variable 43,56 . Melssen et al. 57 delineated that an input vector is connected to an output vector with a single weight vector by a single input layer into a neural network. The output generated from SeOM comes out as a two-dimensional map made up of diverse neurons or nodes knitted together into either a hexagonal, circular or square topological plot based on their proximity 43 . Map sizes were compared based on metrics, quantization error (QE) and topographic error (TE), and a SeOM model with 0.086 and 0.904 respectively was chosen, which was a 55-map unit (5 × 11). The neuron structure was determined based on empirical equation node number given as Data partitioning. The number of data used in this study is 115 samples. A random method was employed to dissect the data into test data (25% for validation) and a training dataset (75% for calibration). The training dataset was used to produce the regression models (calibration), and the test dataset was used to validate generalization capabilities 58 . This was done to evaluate the appropriateness of the diverse models that are being used to predict nickel content in the soil. All the models used were subjected to a tenfold cross-validation process that was replicated five times. The variables generated from EBK interpolation were used as the predictors or explanatory variables to predict the targeted variable (PTEs). The modelling was processed in RStudio, and the packages utilized were library (Kohonen), library(caret), library(modelr), library ("e1071"), library("plyr"), library("caTools"), library("prospectr"), and library ("Metrics").

Model performance metrics.
A variety of validation parameters were used to determine the optimal model suitable for the prediction of nickel concentration in the soil and evaluate the accuracy of the model and its validation. The hybridized models were assessed using mean absolute error (MAE), root means square error (RMSE), and R square, or coefficient determination (R 2 ). R 2 defines the variance of the proportion in the answer and is represented by the regression model. The RMSE and the magnitude of the variance within the independent measurement describe the model prediction power, while MAE determines the actual quantitative value. The R 2 value must be high to evaluate the best-hybridized model using the validation parameters, and the closer the value is to 1, the higher the accuracy. According to Li et al. 59 an R 2 criteria value of 0.75 or greater is considered a good prediction; from 0.5 to 0.75 is acceptable model performance and below 0.5 is unacceptable model performance. A lower obtained value is sufficient and considered best for selecting a model using the RMSE and MAE validation criteria evaluation methods. The following equation describes the validation methods. Mean absolute error R square Root mean square error  www.nature.com/scientificreports/ whereby n represents the size of the observations Y i represents the measured response and the Y i also stated as the predicted response values, accordingly, for the ith observation term.

Results and discussion
Statistical description. The statistical description of the predictors and the response variables are shown in  63 reported a Ni mean concentration of 17.6 mg/kg in an old mining and urban industrial area in Sachsen-Anhalt, Germany, which is 1.45 mg/kg higher than the Ni (16.15 mg/kg) mean concentration in the current study. The concentration of Ni in some parts of the study area's urban and peri-urban soil that exceeds the allowable limit might be attributed largely to steel industries and metal works. This is inline with Khodadoust et al. 64 studies that steel industries and metal works are major sources of nickel pollution in the soil. However, the predictor variables also ranged from 538.70 mg/kg to 69,161.80 mg/kg for Ca, 497.51 mg/kg to 3535.68 mg/kg for K and 685.68 mg/kg to 5970.05 mg/kg for Mg. Jakovljevic et al. 65 investigated the total content of Mg and K in central Serbian soil. They found that the total concentration (410 mg/kg and 400 mg/kg, respectively) was lower than the Mg and K concentration of the current study. Indistinguishably, in eastern Poland, Orzechowski and Smolczynski 66 assessed the total content of Ca, Mg and K, and the results suggested that the mean concentration Ca (1100 mg/kg), Mg (590 mg/kg) and K (810 mg/kg) in the topsoil were lower than the individual elements in this present study. A recent study conducted by Pongrac et al. 67 revealed that Ca total content analyzed in 3 different soil in Scotland Uk (Mylnefield soil, Balruddery soil and Hartwood soil) suggested the Ca content of the present study is higher. The dataset distribution of the elements exhibited different skewness due to the differences in the measured concentration of the elements sampled. The skewness and the kurtosis of the elements ranged from 1.53 to 7.24 and 2.49 to 54.16 correspondingly. All the computed skewness and kurtosis levels of the elements were above + 1, and it thus indicates that the data distribution is irregular skewed in the right direction and leptokurtic. The estimated CV of the elements also suggested that K, Mg and Ni showed a moderate variability, whereas Ca had extremely high variability. The CV of K, Ni and Mg explained that they are homogeneously distributed. Moreover, Ca distribution is non-homogeneous, and an external source might influence its level of enrichment.
Correlation between response and predictor variable. The correlation of the predictors against the response element suggested a satisfactory correlation among the elements (see Fig. 3). The correlation suggested that CaK showed a moderate correlation with r value = 0.53 and CaNi similarly displayed moderate correlation. Even though Ca and K showed moderate nexus, among each other but researchers such as Kingston et al. 68 and Santo 69 have suggested that their content in the soil is inversely proportional. However, Ca and Mg are antagonistic to K, but CaK correlated very well. This might be due to applying fertilizer such as potassium carbonate that is 56% richer in potassium. Potassium correlated moderately with magnesium (KMg r = 0.63). In the fertilizer industry, these two elements have a history of strong relationships due to applying potassium magnesium sulfate, potassium magnesium nitrate and muriate of potash to the soil to enhance its deficiency level. Nickel correlated Spatial distribution of the elements. Figure 4 illustrates the spatial distribution of the elements.
According to Burgos et al. 70 applications of spatial distribution is a technique used to quantify and highlight hot spots of polluted areas. The enrichment level of Ca in Fig. 4 can be seen in the northwestern part of the spatial distribution map. The map shows moderate to high hotspots of Ca enrichment. Calcium enrichment in the northwestern part of the map might be due to the application of quicklime (Calcium oxide) to reduce soil acidity and its application in steel plants as basic oxygen in steel making process. On the other hand, other farmers prefer to use calcium hydroxide in acidic soil to neutralize the pH level, which also surges the calcium content of the soil 71 . Potassium exhibited hot spots in the northwestern part of the map and the eastern part as well. The Northwestern part is the predominantly agrarian community, and a moderate to high pattern of K might be due to the application of NPK and muriate of potash. This is coherent with other studies such as Madaras and Lipavský 72 , Madaras et al. 73 , Pulkrabová et al. 74 , Asare et al. 75 who observed using muriate of potash and NPK for soil stabilization and treatment resulted in high K content in the soil. Potassium enrichment in the northwestern part of the spatial distribution map might be due to the usages of potassium-based fertilizers such as potassium chloride, potassium sulphate, potassium nitrate, sylvinite, and kainit to increase the k content of deficient soil. Zádorová et al. 76 and Tlustoš et al. 77 outlined that the application of potassium-based fertilizer increases the potassium level in the soil and, by a long effect will significantly upsurge soil nutrient content, especially K. Magnesium showed a hot spot in the northwestern part of the map and relatively moderate hotspot in the southeastern part of the map. Colloid fixation in soil depletes the concentration of magnesium in the soil. Its deficiency in the soil causes plants to portray interveinal chlorosis of yellowish colouration. Magnesium-based fertilizers, such as potassium magnesium sulphate, magnesium sulphate and Kieserite, treat deficiency syndrome (purple, red or brown colouration of plants indicating lack magnesium) in soils with normal pH ranges 6 . The accumulation of Nickel on the surface of the urban and peri-urban soil might be due to anthropogenic activities such as agriculture and Ni importance in stainless steel production 78 .
The results of the model performance metrics of the elements used in this study are presented in Table 2. The RMSE and MAE for Ni, on the other hand, were both closer to zero (0.86 RMSE, −0.08 MAE). The RMSE and MAE values for K, on the other hand, were both acceptable. The RMSE and MAE results for calcium and The results of this study's RMSE and MAE for predicting Ni using EBK were found to be better than those of John et al. 54 for predicting S concentration in soil using cokriging using the same collected data. The EBK output of our study is related to those of Fabijaczyk et al. 41 , Yan et al. 79 , Beguin et al. 80 , Adhikary et al. 81 , and John et al. 82 , especially K and Ni.
Performance of models. The performance of individual approaches for predicting Ni content in urban and peri-urban soil was assessed using the models' performance (Table 3). Model validation and accuracy assessment confirmed that the Ca_ Mg_ K predictors coupled with EBK SVMR model yielded the optimal performance. The    34 . The final map (Fig. 5) created using the EBK _SVMR model and Ca_Mg_K as predictors showed patches of hotspots and a moderate to nickel prediction across the entire study area. This implies that the concentration of Ni in the study area is primarily moderate, with high concentrations in some specific areas.
Visualization of predicted Nickel via EBK_SVMR model using self-organizing map. Presented in Fig. 6 is the PTEs concentrations as component planes comprising of individual neurons. No component plane exhibited the same colour pattern as shown. However, the appropriate number of neurons per plotted map was 55. The SeOMs were made using various colours, and the more similar the colour pattern, the more comparable the sample attributes are. According to its precise colour scale, the single elements (Ca, K, and Mg) displayed a similar colour pattern with single high neurons and most low neurons. Consequently, CaK and CaMg shared some similarities with very high-level neurons and low to moderate colour patterns. Both models predicted the concentration of Ni in the soil by displaying moderate to high shades of colours such as red, orange, and yellow. The KMg model showed a lot of high colour patterns according to the precise scale and low to moderate patches of colours. The component plane distribution patterns of the models revealed high colour patterns according to the precise color scale ranging from low to high, indicating the potential concentration of Ni in the soil (see Fig. 4). The CakMg model component plane showed a diverse colour pattern from low to high according to the accurate colour scale. In additament, this model's prediction of nickel content (CakMg) is similar to the spatial distribution map of Ni shown in Fig. 5. Both maps revealed high, moderate, and low proportional Nickel concentrations in urban and peri-urban soil. Figure 7 depicts the silhouette method in k-mean groupings on the maps, which are divided into three clusters based on the predicted values in each model.

Conclusion
The current research clearly illustrates a modelling technique for nickel concentration in urban and peri-urban soil. The study tested different modelling techniques, combining elements with modelling techniques to obtain the best method for predicting nickel concentration in soil. The SeOM component plane spatial characteristics of the modelling techniques exhibited a high colour pattern spanning between low to high on a precise colour scale, suggesting the concentration of Ni in the soil. However, the spatial distribution map corroborates with the component plane spatial distribution exhibited by EBK_SVMR (see Fig. 5). The results indicated that the support vector machine regression model (Ca Mg K-SVMR) predicted the concentration of Ni in the soil as a unitary model, but validation and accuracy evaluation parameters revealed that the error in terms of RMSE and MAE was very high. The modelling technique employed utilizing EBK_MLR models, on the other hand, was similarly deficient due to the low coefficient of determination (R 2 ) values. The use of EBK SVMR and combined elements (CaKMg) resulted in good results with low RMSE and MAE error and a 63.7 percent accuracy level. The results proved that combining the EBK algorithm with a machine learning algorithm can generate a hybrid algorithm that can predict the concentration of PTEs in soil. The results indicated that utilizing Ca Mg K as predictors to predict Ni concentrations in the study area improved Ni prediction in the soil. It implies that the continual application of Ni-based fertilizer and industrial pollution of soil through the steel industry has the tendency to raise the concentration of Ni in the soil. The study revealed the ability of the EBK model to reduce error levels and improve the accuracy of spatial distribution models of soils in urban or peri-urban soil. Generally, we suggest applying the EBK-SVMR model for assessing and predicting PTEs in the soil; moreover, hybridization using EBK with various machine learning algorithms is also recommended. The use of elements as covariates predicted Ni concentration; however, using more covariates will go a long way to improve the model's performance, which can be considered a limitation of the current work. An additional limitation of this study is that number of datasets is 115. As a result, if more data is provided, the performance of the suggested optimized hybridization approaches can be increased. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.