Estimation of compressive strength of waste concrete utilizing fly ash/slag in concrete with interpretable approaches: optimization and graphical user interface (GUI)

Geo-polymer concrete has a significant influence on the environmental condition and thus its use in the civil industry leads to a decrease in carbon dioxide (CO2) emission. However, problems lie with its mixed design and casting in the field. This study utilizes supervised artificial-based machine learning algorithms (MLAs) to anticipate the mechanical characteristic of fly ash/slag-based geopolymer concrete (FASBGPC) by utilizing AdaBoost and Bagging on MLPNN to make an ensemble model with 156 data points. The data consist of GGBS (kg/m3), Alkaline activator (kg/m3), Fly ash (kg/m3), SP dosage (kg/m3), NaOH Molarity, Aggregate (kg/m3), Temperature (°C) and compressive strength as output parameter. Python programming is utilized in Anaconda Navigator using Spyder version 5.0 to predict the mechanical response. Statistical measures and validation of data are done by splitting the dataset into 80/20 percent and K-Fold CV is employed to check the accurateness of the model by using MAE, RMSE, and R2. Statistical analysis relies on errors, and tests against external indicators help determine how well models function in terms of robustness. The most important factor in compressive strength measurements is examined using permutation characteristics. The result reveals that ANN with AdaBoost is outclassed by giving maximum enhancement with R2 = 0.914 and shows the least error with statistical and external validations. Shapley analysis shows that GGBS, NaOH Molarity, and temperature are the most influential parameter that has significant content in making FASBGPC. Thus, ensemble methods are suitable for constructing prediction models because of their strong and reliable performance. Furthermore, the graphical user interface (GUI) is generated through the process of training a model that forecasts the desired outcome values when the corresponding inputs are provided. It streamlines the process and provides a useful tool for applying the model's abilities in the field of civil engineering.


Alkali activated-based geopolymer concrete (AA-GPC)
Fly ash (FA) is a waste product of the thermal coal manufacturing process.It is pozzolanic that have the properties of binding materials which get polymerized under high temperature with an alkaline medium to make GPC.As a result, a crystalline and amorphous combination is created, which may then be used to provide the necessary mechanical characteristics.Due to the significant requirement for heat curing, however, field applications of geo-polymerization chemicals are not advised 30,31 .Due of its high curing temperature requirement, FA-GPC application in the construction sector will be restricted 31 .Therefore, by using a slag combination with a high concentration of calcium, silica, and alumina, heat demand may be decreased 32 .When GGBS and FA are combined, calcium aluminosilicate hydrate (C-A-S-H) is created as a byproduct.This gel is responsible for making a dense gel structure in GPC.Thus, its utilization can achieve early strength and produces enhanced mechanical properties 33 .Mehta et al. 34 studied the mechanical properties of FA-GPC with GGBS by varying its concentration.The author observed a compact microstructure containing hydration and polymerization products, which greatly improved the strength of GPC during its early stages.Yazdi et al. 35 examined the impact of GPC by altering the FA concentration from 30 to 100% in combination with GGBS.The author illustrates that substituting 50% of fine aggregate (FA) with ground granulated blast furnace slag (GGBS) leads to a significant enhancement in both compressive and flexural strength, with values of 100 MPa and 10 Mpa correspondingly.In addition, Fang et al. 31 investigated the impact of slag on the flexural and split tensile strength of FA-GPC and found that it resulted in increased strength levels.This phenomenon occurs as a result of the creation of C-A-S-H gel and N-A-S-H gel by the manipulation of GGBS concentration, leading to an expedited reaction process 36 .Moreover, Table 1 illustrates the mixing regime of FA-GGBS-based GPC.Hence, the utilization of blend slag with FA in making GPC results in the decrease of CO 2 emission in the atmosphere, disposal of waste, and consumption of energy to make eco-friendly concrete.
An experimental mix design is utilized to establish the optimum loading capacity of concrete.However, prior research has shown that GP concrete is significantly influenced by the chemical composition and physical amounts of factors.There exists heterogeneity in making GP concrete due to its diverse factor and still their exit ambiguity in making a mixture [41][42][43] .However, many methods and methodologies based on statistical approaches may be used to analyze the crushing nature of GPC.Despite the little study into the link between variables and mechanical properties.Recently, the use of machine learning algorithms (MLAs) and their application in concrete mix design has increased in the last decades [44][45][46][47][48] .This is due to its non-linearity relationship and its complex behavior that takes the uncertainty and predicts the most influential results by taking dependent and independent variables.MLAs frequently split the data samples into the training set and testing set.The employed algorithms in making a model trains it by using a training set and predicting the outcome on the test set.The usefulness of MLAs is often lying in their nonlinearity relation due to their automatically complex iteration to achieve accurate prediction [49][50][51] .Furthermore, empirical models, iterate once in the data sets and predict the outcome with fitted variables that is why linear regression is not useful in getting accurate and desirable predictions.
Lokuge et al. 41 predict the compressive nature of FA-GPC by employing the Multi-variate Adaptive Regression Splines (MARS) algorithm.The author predicts robust performance by using nonlinear regression.Mohsin et al. 52 employed two approaches namely as random forest and gene expression programming by using 298 data points, and the results showed that RFR provides a more accurate performance than GEP with R = 0.9826.Similarly, Ayaz et al. 53 predict the loading capacity of FA-GPC on 154 data samples that was gathered from literature by employing supervised machine learning approaches.The author achieved a good correlation of R 2 = 0.97 by using the bagging algorithm as compared to AdaBoost and decision tree algorithms.Unluer et al. 54 used different MLAs including support vector machines (SVM), backpropagation neural networks (BPNN), and extreme learning machines (ELMs) for the prediction of FA-GPC with 110 data points.The author concluded that SVM outclasses ELM and BPNN with a higher correlation for testing set with R 2 = 0.955.Ali et al. 55 utilized 399 data sets to estimate the compressive nature of GPC using an artificial neural network (ANN) by using two hidden layers of ANN.The author obtained an excellent response with R 2 = 0.9916 as compared to other layers.In addition, Mohsin et al. 42 forecast the mechanical characteristics of GPC using the GEP technique.The results show a significant connection with the empirical equation.Similarly, Alkaroosh et al. 56 forecast the GPC strength by utilizing the GEP model with 56 data points and reported a good correlation with R 2 = 0.89 for validation set.Aneja et al. 57 forecasted the compressive strength of FA and BA-based GPC by using artificial neuron network by gathering 46 data points from the literature and made eighteen samples by conducting experimental work.Fourteen ANN models that differ in hidden layers, backpropagation training algorithms, and neuron was tilized.Bayesian Regularization (BR), Levenberg-Marquardt (LM), and scaled conjugate gradient (SCG) as backpropagation algorithms was employed.Moreover, number of neurons, hidden layers and training approaches have siginificant influence of ANN models.Moreover, model evaluation is gauged by mean square error (MSE), and coefficient of correlation ®.The author reveals that increasing hidden layers decreases the MSE and increases the R value of BR and LM models.In addition, eight model (ANN-VIII) BR-ANN with three layers and 10 neurons demonstrates accurate prediction with R = 0.99 and lesser MSE = 1.017.Dong et al. 58 forcasted the compressive strength of GPC with artificial neural network and adaptive neuro fuzzy inference (ANFIS).The author use four input parameters for prediction and reported that ANFIS performs much better then ANN in term of R 2 and statistical measures.Cao et al. 59 utilized three approaches namely as support vector machine (SVM), extreme gradient boosting (XGB), and multilayer perceptron (MLP) for predicting the compressive strength of GPC.The author reveals that XGB approach outclass by depict higher correlation R 2 = 0.98 as compared to SVM and MLP which is 0.91 and 0.88, respectively.In addition, statistical measures and validation with K-fold confirms the high accuracy of XGB.Simlarly, Ashrafian et al. 60 anticipated the apparent chloride concentration of marine structure having different zones.The author employees gene expression programming (GEP), Multivariate Adaptive Regression Splines (MARS), and M5p Model Tree (MT) on 642 data points, and observed that MARS outperformed outclass from both approaches.In addition, Ashrafian et al. 61 predicted the slump flow of self compacting concrete (SCC) with 117 data points by utlizining MARS and MT approach, and observed both models have greater potential with higher precision.Chu et al. 62 anticipated the mechanical strength of FA-GPC on 311 data points by employing GEP and MEP techniques and revealed that GEP is far superior to MEP as it gives an empirical relation with accuracy.Ashrafian et al. 63 anticapted the strength of waste concrete by employing various machine learning approaches and demonstrated that fuzzy model with HOA optimizer depicts robust performace as compared to remaining models.Thus, MLAs have been used in the civil engineering domain due to their extraordinary response to prediction as listed in Table 2. MLA employs statistical analysis and database approaches to extract correlations, undiscovered patterns, and data from massive datasets.Prediction and modeling often use one of the two approaches.The first option, the traditional technique, is based on a stand-alone model 64 .The second approach makes use of several ensemble learning algorithms including bagging, boosting, and random forests.
The newly developed ensemble learning algorithms significantly outperformed traditional ML models when predicting outcomes [65][66][67] .The training data is used to educate a large number of weak learners, and then those learners are combined to produce strong learners using the ensemble learning approach.These weak learners are developed using individualized learning techniques such as ANN, SVM, and DT.These recently established ensemble techniques may be used to investigate the properties and resilience of complex materials, such as HPC made from waste.Ensemble learning and classifier generation algorithms have been the subject of recent studies because of their potential to improve machine learning model performance.Likewise, Ahmad et al. 68 utlizes adaBoost, decision tree and random forest approaches with 207 samples to predict the high-temperature compressive strength of concrete.The author observed that AdaBoost approach give strong R 2 = 0.938 as compared to other approaches, and cement concentration (CC) of the mixture was the most sensitive variable in prediction.Chou et al. 69 used MLP, SVM, a classification and regression tree (CART), and linear regression (LR) to develop separate and combined classifiers for learning.The results demonstrated that HPC (high-performance concrete) compressive strength may better predicted using ensemble learning approaches than using individual learning methods.To foretell HPC's compressive and tensile strengths, Nguyen et al. 69 used prediction algorithms such SVM, MLP, GBR, and XGBoost.The author demonstrated that GBR and XGBoost performed better as compared to SVM and MLP.The distinction between standalone algorithms and ensemble methods is also shown in Fig. 2.
It also shows how individual algorithms and ensemble techniques differ from one another.Studies reveal that nonlinear regression is used to obtain the optimal performance of low calcium FA-GPC but no nonlinear regression MLAs by employing fly ash slag-based GPC.In this research work, AdaBoost and bagging on MLPNN were used to make an ensemble model to predict the mechanical strength of mixed GPC by

Data description
The data of GP concrete containing FA and GGBS has been taken from the published literature to model the strength.Furthermore, the effectiveness of the gathered data is entirely dependent upon the data points and the variable used to make a model (see supplementary file S1).The parameters used to make model compromises of fly ash (kg/m 3 ), granulated blast furnace slag (kg/m 3 ), fine and coarse aggregate (kg/m 3 ), alkaline activators (kg/m 3 ), superplasticizer (kg/m 3 ), sodium hydroxide (M), temperature (°C).Moreover, the descriptive statics with frequency distribution is shown in Table 3 and Fig. 3.In contrast, the maximum and minimum values with their averages are also shown in Fig. 4.Moreover, the pearson correlation coefficient shows the linear relationship between input and output variables,ranging from -1 to 1 as illustrated in Fig. 5.In general it can be depicted that the correlation between inputs and outputs are weak, varying from [− 0.03 − 1].In addition the strongest correlation between ground granulated blast furnace slag and flyash.Consequently, each of the nine inputs to the dataset can be considered  independent, enabling each variable to be used to construct machine learning models and conduct feature significance and sensitivity analyses.
It is evident that the Flyash/slag based material has a significant role in enhancing strength.The reason for this is the Pozzolanic activity of both fly ash and GGBFS.Pozzolanic materials have the ability to react with calcium hydroxide, which is produced during cement hydration, when water is present.This reaction leads to the formation of more cementitious compounds.The pozzolanic reaction generates calcium silicate hydrates (C-S-H) and other cementitious substances, which efficiently occupy the empty spaces and improve the overall strength and density of the concrete structure.Moreover, their addition will reduces the water content, filler effect, and enhanced hydration reaction that ultimately increase in strength when fly ash and GGBFS are added to concrete mixtures, making them valuable and sustainable additions to enhance concrete performance.

Methodology
The data description is used to create a models to predict the compressive property of FASBGPC, statistical analysis, parametric study of the influencing variable, and the algorithms used to create the model.This sections discuss the effect of all these in details in making predictive models.Moreover, Sklearn library in Python is employed for interpretability analysis of models, which was trained by using an Artificial Neural Network (ANN) architecture.The ANN was chosen due to its effectiveness in capturing complex patterns in the data and its flexibility in handling various types of input features.In addition, hyperparameter tuning on the ANN model is used to optimize its performance to find the best performance.In addition, ensemble techniques, specifically bagging and boosting is performed to enhance the predictive performance of the ANN model.Parameter tuning was conducted not only for the individual ANN model but also for the ensemble approaches to fine-tune their hyperparameters for improved performance.Moreover, Fig. 6 illustrates the overall flowchart used in this research.

Multilayer perceptron neuron network (MLPNN)
Biological nervous system microstructure (neurons) is simulated by artificial neural network (ANN) algorithms.ANN networks are made up of a lot of parallel connections.These cells receive biased contributions from ANN neurons and then provide the biased production to other neurons through an stimulation function.There are  one or more multilayers in these neutral activities.Moreover, ANNs uses neural activities as a path between network.The perception response is determined by the numeral input variables, output, and input layers.Individually network of ANN is made up of three standard layers namely the input layer, output layer, and hidden layer.The hidden layer which acts as a source of accuracy by doing a nonlinear performance on the parameters using activation function is located between the input layer and the output layer.Moreover, the hidden layer can be one or more depending on the accuracy of the model.Although a single hidden layer can handle all of the challenges that a perceptron encounters, it is more beneficial and efficient to employ multiple hidden layers.
Figure 7 shows the architecture of a neural network, which contains an input layer, two hidden layers, and an output layer.Except for the input layer, every neuron in a layer determines the linear combination and bias.The neutrons then utilize the model's output to compute the non-linear activation function sigmoid in their input.This research study performs ANNs modeling with the use of a multi-layer perceptron (MLPNN) feedforward network.The best performance of the model is evaluated by changing the numbers of neurons and hidden layers.Afterward, the gathered data from the literature is split into testing and training sets.This is done by splitting the data into random sets into a ratio of 80% (train set) and 20% (test set) to mitigate the overfitting effect and bias of the model in predicting the mechanical property of FASBGPC.

Hyper-tuning parameters
The selection of appropriate parameters are crucial in making ML models to obtain the highest accuracy.In this study, numerous hyper-parameter combinations were investigated to improve the model's accuracy, as shown in Table 4.In addition, Fig. 8 demonstrates the ANN model with optimal parameters.Furthermore, a critical stage in the development of non-linear models is selecting the appropriate hyper-parameters.As a result, testing multiple configurations to discover the one that would operate correctly and stably when constructing ML models may require a significant amount of effort.The details of parameters selection is can also be seen in supplementary file S1.

Ensemble methods using boosting and bagging approach
To increase the recognition and prediction accuracy of machine learning, ensemble techniques are utilized (ML).By merging and pooling several subpar prediction models, these solutions often help to alleviate over-fitting concerns (component sub-models).These sub-models learn nonlinearly and intelligently, producing a learner that is more accurate at making predictions.The best parametric/predictive model is produced by combining qualifying sub-models using averaging or voting combination techniques.

Bagging and boosting algorithm
Figure 9 depicts the schematic technique of the bagging model.Bagging is the term used to describe the modification to the estimation procedure resulting from the substitution of a new dataset for the training dataset.The primary set data are replaced as part of the strategy for irregular sampling.Using replacement sampling, specific observations can be replicated in each training database.Every phase of the bagging procedure must have an equal option in the new database.The training dataset size has no effect on the accuracy of predictions.Moreover, modifying the intended outcome prediction appropriately reduces variance significantly.Using this resource, additional models can be trained 84 .In this ensemble, the predicted values of the models are utilized.In regression, a prediction could be the mean of forecasts from multiple models.The optimal output value is determined by utilizing the DT's 20 sub-models to update the aggregating model.In addition the overall bagging principle of ensemble learning process involves the following steps.www.nature.com/scientificreports/models form the final output of the bagging ensemble method.This combined output tends to be more robust and accurate compared to the prediction of any single model, as it leverages the diverse perspectives learned by each model from its subset of the data.Like the bagging technique, the boosting strategy influences the creation of several components that are more precise than a single model by creating a collective model as illustrated in Fig. 10.By means of biased averages of the dependent sub-models, the boosting approach chooses which sub-models to include in the final model.In this work, a boosting and bagging strategy was utilized to build an ensemble model on FASBGP concrete using a single ANN algorithm as a fundamental learner.Moreover, process regarding boosting as ensemble approaches involves the following steps.(1) Weighted Training: Initially, each data point in the training set is given equal weight.A weak learner (e.g., decision tree with limited depth) is trained on the entire dataset, focusing on the areas where the previous model(s) made mistakes.This model tries to fit the data while minimizing the errors.( 2) Error Calculation: After the first model is trained, the errors or misclassifications made by this model are identified by comparing its predictions to the actual labels.Data points that were incorrectly predicted are assigned higher weights, while correctly predicted points are given lower weights.(3) Focus on Misclassified Data: The next weak learner is trained on the modified dataset, where the emphasis is placed on the previously misclassified data points.This iterative process aims to 'boost' the model's performance by focusing on the areas where it has not performed well.(4) Sequential Improvement: The process continues for multiple iterations (typically with a predefined number of iterations or until a specified performance threshold is reached).Each subsequent model is trained to correct the errors of the combined ensemble of models created so far.(5) Weighted Combination: Finally, the predictions from all weak learners are combined with different weights assigned to each model based on its performance.Usually, models that perform better are given higher weights in the final prediction.(6) Final Output: The final output is generated by aggregating the predictions of all weak learners, often through a weighted sum or a weighted voting scheme, where the weights are determined by the performance of each model in the ensemble.The model tuning variables that are employed in making an ensemble approach can be (i) learning rate (ii) parameters associated with the finest number of model learners and other specific characteristics that have a significant influence on ensemble methods.The best and optimal model using twenty sub-model each with 10, 20, 30,…, 200, and nth estimator components on-base learner (ANN) were made and the most robust performance was selected.Table 5 demonstrates how the optimal model was selected using the correlation with high coefficient values.As a consequence, the ensemble with boosting outperforms the bagging and individual ANN models in terms of correlation coefficient.

K-fold cross-validation method
Cross-validation (CV) is a technique that is used to examine and eliminate the bias and variance of the data to make an effective machine learning model.It is also known as a re-sampling procedure, and it is employed to assess the model's effectiveness.This strategy is the easiest and outcomes in a less biased model as compared with other models.This is because it ensures that every observation from the data has an equal probability of appearing in both the test and train sets 85 .The accuracy of the algorithm is obtained in the form of statistical and average errors accuracy throughout ten verification cycles as demonstrated in Fig. 11.

Statistical analysis
Statistical checks are employed to evaluate the efficacy and accuracy of the model.Mean absolute error (MAE), coefficient of determination (R 2 ), root mean square error (RMSE), relative squared error (RSE), and relative root mean square error (RRMSE) are some statistical indicators.These indicators are defined by Eqs.(1-5).where t j = Before creating a model, try out different values.p j = The model's expected result.t j = Indicate the desired mean value.p i = show the anticipated mean value.m = Denotes the total number of examples used for modeling.
A higher coefficient of determination (R 2 ) of the model is used to assess its reliability and accuracy, and a higher value with a lower value of statistical indicators indicates an ideal model 86 .R 2 values are usually between 0 and 1, and a value close to 1 means that the experimental and predicted values are linked well 87 .According to some experts, the effectiveness of the proposed model significantly affects the model's ratio of data points to inputs.The optimal model should have a ratio larger than five to assess if the data points are acceptable for creating the required relationship between chosen variables.According to the present study's results, the proposed ratio is 17, which fits the researchers' standards.In addition, external statistical checks are used to assess the model, which are detailed in Table 6.

Model result
The prediction results from the supervised algorithms give robust performance as illustrated in Fig. 12.It can depicts in Fig. 12a that the ANN algorithm yield a good correlation with R 2 = 0.82951 with better accuracy.This is because ANN uses hidden layers to give a better response as compared to linear regression.Moreover, the model accuracy of FASBC is also depicted by its errors allocation as in Fig. 12b. Figure 12a represents that some of the predicted data points lie away from the regression line and the same is also depicted in the error graph.In addition, a comparison of the model is made with linear regression (LR) in terms of R 2 and error allocation as depicted in Fig. 12c,d.The evaluation of models is compared through MAE and the average errors of the testing set.ANN model shows 28 percent and 27% enhancement with MAE and average error respectively.This is due to the complex behavior of ANN as it mitigates the effect of linear regression and gives robustness performance.
Vol.:(0123456789) with seventeen sub-model as it takes several weak learners to predict the optimal response of the model (see Fig. 14a).Moreover, Furqan et al. 73

Ensemble with bagging algorithm
The predicted outcome of FASBC with a bagging approach with twenty sub-models is illustrated in Fig. 15.It can be seen that the ensemble models show good performance from the individual ANN model as depicted in Fig. 16 with R 2 = 0.89.Moreover, the bagging algorithm decreases the effect of variance and overfitting of data to give a better response.The individual model which is a weak learner is trained on twenty sub-models to give the most advantageous predicted outcome as depicted in Fig. 16. Figure 16a demonstrates that the submodels with   www.nature.com/scientificreports/n estimater equal to eighteen exhibit robust performance.In contrast, the model's accuracy is also estimated by its mistakes, as illustrated in Fig. 16b.Furthermore, the bagging model outperforms ANN in terms of MAE and average error distribution, with a current accuracy of 51% when compared.

Statistical analysis of model
Statistical analysis is used to assess the model's effectiveness in predicting concrete properties.Moreover, it is the mathematical equation-based formalization of connections between variables in data.Furthermore, the coefficient of determination (R 2 ), mean absolute error (MAE), relative squared error (RSE), root mean square error (RMSE), and relative root mean square (RRMSE) were used to compare individual and ensemble methods, as shown in Table 7.
Moreover, an external statistical analysis check is also applied to the predicted outcome as mentioned in the literature.The result of the aforementioned algorithm is summarized in Table 8.

Cross-validation of K-folds
The validation process is essential to the model's success because it ensures that the data used to create the model is as accurate as possible.K-Fold CV is employed to improve the reliability, robustness, and effectiveness of the individual and ensemble models as depicted in Table 9 and Fig.

Shapley analysis
The shapley analysis is performed on FASBC using the Sklearn library in Python and the detailed analysis of each variable is illustrated in Fig. 18.In addition, the best method is to firstly apply the SHAP explainer to the training data subgroup to recognize the model's behavior and address any issues, followed by validation on the testing Table 9. Evaluation of models using K-Fold validation.It demonstrates that GGBS, temperature, sodium hydroxide and its molarity, and FA have a significant impact on the compressive strength of FASBC.This is due to the presence of Al 2 O 3 , CaO, and SiO 2 in the amorphous state [90][91][92] .Furthermore, GGBS has a binding behavior in an alkaline media.It's a key ingredient in geopolymer concrete as it contributes to its strength and durability.GGBS contains silicates and aluminates, which react with the alkali activator (sodium hydroxide) to form the binding matrix in geopolymer concrete 93 .Similarly, temperature affects the curing process of geopolymer concrete.Generally, higher curing temperatures can accelerate the reaction between the alkaline solution (usually sodium hydroxide) and the precursors (like GGBS and fly ash), leading to faster setting times and increased strength.Moreover, sodium hydroxide, commonly known as caustic soda, is used as an alkaline activator in geopolymer concrete.Its concentration (molarity) influences the reactivity and strength of the resulting geopolymer.Higher concentrations or molarities of sodium hydroxide often lead to faster reaction rates and higher strengths 94 .Likewise, when GGBS is mixed with FA in an alkaline media, extra calcium content is formed, which is responsible for the improved mechanical qualities.Thus, contributes to the overall strength and durability of the concrete.Local and global explainability of the model was enhanced using SHAP analysis.In global explainability analysis, mean SHAP values were used for features importance ranking as depicted in Fig. 18, and a summary plot was used to indicate the features values' impact on the CS as shown in Fig. 19.The summary plots display the trend of the associated variable and the distribution of SHAP values for a certain feature.The y-axis of the summary plot displays the input variables utilized in the study and their significance in a sequential manner, while the x-axis represents the corresponding SHAP value.The dark color represents their size, ranging from little (blue) to large (red), and they serve as samples in the database.The x-axis represents the range of prediction, measured by SHAP values, for each variable as the input variables change in magnitude (from blue to red).In addition, red indicates positive SHAP values, suggesting that higher values of that feature positively impact the model's output or prediction.For instance, higher values might lead to higher predicted outcomes.Simiarly, blue represents negative SHAP values, implying that lower values of that feature have a positive effect on the model's output.In other words, lower values might result in higher predicted  Figure 20 explores the correlation between components and their influence on GPC model.The graph demonstrates that increasing the FA results in a decrease in the CS after surpassing a threshold of 300 (kg/m 3 ), indicating a substantial reduction in the CS.The SHAP study revealed that the ideal amount of FA was approximately 300 kg/m 3 , leading to the highest level of CS.However, the negative impact was demonstrated with higher levels of FA concentration.The negative effect on the majority of fly ash (FA) could be attributed to the disparity in calcium levels seen in the FA used across several investigations 95 .Furthermore, there is a notable upward tendency for GGBS, where higher concentrations greatly improve the compressive strength (CS) of GPC.Furthermore, Kashifi et al. 96 has also seen a comparable pattern.Furthermore, both sodium silicate and sodium hydroxide exhibit a favorable impact up to a specific threshold.For example, when sodium silicate is elevated over a particular threshold, it has a harmful impact.Likewise, the effect of large-sized aggregate has a fluctuating impact on compressive strength, particularly when its quantity exceeds 1200 kg/m 3 .The data demonstrates that variables such as temperature and sodium hydroxide have a positive impact on the outcome of the model.It accelerates the polymerization reaction.Increasing the curing temperature was found to positively impact the compressive strength (CS) of polymer composites.Similary, Zhang et al. 97 also documented a comparable effect.The findings of this study were obtained by analyzing the inputs and dataset size used in the SHAP analysis.Increasing the scope of the database to include a wider variety of input variables has the potential to yield more precise correlations.

Graphical user interface
A straightforward and user-friendly graphical user interface with input variables and output strength has been built for the practical application of our model, as illustrated in Fig. 21.The user must input the amounts of the input variables in the units specified while the compressive strength will be calculated at the conclusion of the analysis.All settings may be reset to default empty values using the clear button.This straightforward GUI offers effective model applicability for both study and business.The creation of a graphical user interface (GUI), which offers a natural and user-friendly manner to interact with computer programs via visual components, is essential for the development of contemporary computer applications.

Conclusion
The implementation of individual and ensemble methods employing ANN with Adaboost and bagging to anticipate the mechanical characteristics of FASBGPC is described in this study.Python code was utilized in the Anaconda prompt navigator with Spyder to run the necessary models for additional investigation.For the assessment and correctness of each model, statistical indicator checks in the form of numerous statistical mistakes were performed.The research led to the following findings:
conducting a literature review in an anaconda navigator using Sypder version 5.1.5.A vast data set of about 156 is gathered from the literature and Web of Science present in supplementary file S1.Additionally, K-Fold crossvalidations, statistical analysis, and external validation checks are used to assess the correctness of the models.Shapley analysis is done to give the most influential variable.Furthermore, a graphical user interface (GUI) is made for practical implementation in the industry for concrete prediction.To the best author's knowledge, no work is reported by conducting nonlinear regression by employing AdaBoost and bagging on FASBGPC.

Figure 2 .
Figure 2. Difference between individual and ensemble approaches.

Figure 4 .
Figure 4. Parameter maximum and minimum value.

Figure 5 .
Figure 5. Pearson correlation of input parameters to output strength.

( 1 )
Bootstrap Sampling: Random subsets of the original dataset are created through bootstrap sampling, where data points are randomly selected with replacement.Multiple subsets, often of equal size to the original dataset, are generated.Each subset is used to train a separate base model.(2) Model Training: A base model (e.g., decision tree, neural network) is trained independently on each subset of the data.Each model learns different patterns from its subset due to the randomness introduced by the sampling process.For instance, in the case of neural networks, multiple networks are trained with different subsets.(3) Individual Predictions: Once the models are trained, they are used to make predictions on the validation or test dataset (data not used for training).Each model independently generates its predictions based on the input data it has been trained on.(4) Aggregation of Predictions: The final output is generated by combining the predictions of all individual models.For regression tasks, this often involves averaging the predictions made by each model.In classification tasks, the final prediction may be determined by majority voting or averaging the probabilities predicted by each model.(5) Final Output: The aggregated predictions from all

Figure 8 .
Figure 8. Optimization of the model by hit and trial method to achieve the best performance.
Reports | (2024) 14:4598 | https://doi.org/10.1038/s41598-024-54513-ywww.nature.com/scientificreports/ anticipated the same response by employing ensemble algorithms.In addition, Fig. 14b demonstrates the error distribution of the ensemble model.It demonstrates that average errors are reduced and exhibits a 33 percent improvement over the individual ANN model.In addition, the Adaboost model illustrates 32 percent robust performance with MAE error than 51 percent for LR showing that experimental and predicted values are close to one with the least errors.

Figure 12 .
Figure 12.Regression models.(a) ANN prediction model; (b) error distribution of ANN model; (c) LR regression prediction model; (d) error distribution of LR model.

1 .
The ensemble methods, which include bagging and boosting techniques, exhibited more dependable performance in comparison to individual methods.More precisely, the utilization of boosting with artificial neural networks (ANN) led to an enhancement compared to the individual strategy.Furthermore, both techniques demonstrated a strong association, with an R 2 value exceeding 0.85.2. Compared to the LR model, the ANN model's prediction of FASBGPC's compressive strength has a good relationship with R 2 = 0.829.In addition, statistical indicator in form of MAE depicts that ANN models give 28% more accurate result.This shows that non-linear regression gives a robust performance than linear models.3. Adaboost with ANN as an ensemble model shows an obstinate correlation of R 2 = 0.928 as compared to ANN.Similarly, ANN with bagging gives R 2 = 0.89, as opposed to ANN which is R 2 = 0.829.Thus, the ANN model with boosting shows the least error and accurate performance than other models.4. Numerical errors in the form of MAE, RMSE, RRMSE, RSE, and R 2 provide obstinate outcomes for ANN with Adaboost by displaying the least error indications.Also, External validation checks fulfill the criteria and thus boosting present a good model.5. Validation by using K-Fold cross-validation and statistical indicator shows a higher accuracy of the models.6. Shapley analysis reveals that GGBS, sodium hydroxide molarity, and temperature has a major influential aspect on crushing strength of GPC.

Table 2 .
Forecasting of mechanical properties by using MLAs.

Table 4 .
Model learning with parameters tuning.

Table 5 .
Sub-model details in making an ensemble model.

Table 7 .
Statistical evaluation of models with different errors.

Table 8 .
Statistical evaluation of models using external checks.
70.It can be seen that the ANN model with Adaboost outclassed with individual ANN and ANN with bagging model.Moreover, a comparison is made between its average error using MAE and RMSE.MAE of ANN-AdaBoost gives enhanced results by about 46% and ANN-bagging depicts 24% robust performance.Similarly, boosting and bagging show a 44% and 28% increase in model capacity as compared to the individual ANN model.The same result has been reported by Ayaz et al.70by employing ensemble approaches to predict the most beneficial outcome.
data subset to evaluate interpretability and generalization performance.This comprehensive approach ensures a thorough evaluation of the model's behavior across different datasets and leads to more robust interpretations.