Unboxing machine learning models for concrete strength prediction using XAI

Concrete is a cost-effective construction material widely used in various building infrastructure projects. High-performance concrete, characterized by strength and durability, is crucial for structures that must withstand heavy loads and extreme weather conditions. Accurate prediction of concrete strength under different mixtures and loading conditions is essential for optimizing performance, reducing costs, and enhancing safety. Recent advancements in machine learning offer solutions to challenges in structural engineering, including concrete strength prediction. This paper evaluated the performance of eight popular machine learning models, encompassing regression methods such as Linear, Ridge, and LASSO, as well as tree-based models like Decision Trees, Random Forests, XGBoost, SVM, and ANN. The assessment was conducted using a standard dataset comprising 1030 concrete samples. Our experimental results demonstrated that ensemble learning techniques, notably XGBoost, outperformed other algorithms with an R-Square (R2) of 0.91 and a Root Mean Squared Error (RMSE) of 4.37. Additionally, we employed the SHAP (SHapley Additive exPlanations) technique to analyze the XGBoost model, providing civil engineers with insights to make informed decisions regarding concrete mix design and construction practices.


Related work
Recently, there has been a thriving interest in leveraging ML techniques to predict material and structural strength.This section offers a summary of significant research efforts in this field and presents their key contributions in Table 1.Specifically, three ML models, CatBoost, k-Nearest Neighbors, and Support Vector Regression, have been employed to forecast concrete strength 21 .These models are supplied with six features extracted from 249 samples, encompassing parameters such as cement, slag, water, sand, crushed stone, and additives.Among these models, k-Nearest Neighbors has shown the most promising performance, achieving the lowest error rate and the highest determination coefficient.
Another investigation focusing on the prediction of concrete compressive strength has reported that Extreme Gradient Boosting (XGBoost) outperformed Random Forests and Support Vector Regression 22 .This study used eight features, including cement use, age, water content, coarse aggregate, fine aggregate, high-efficiency water reducer, fly-ash, and mineral powder, derived from a dataset comprising 60 samples.
In ensemble learning, an assessment of four models, namely Adaptive Boosting (AdaBoost), Gradient Boosting Decision Trees (GBDT), XGBoost, and Random Forests, was conducted to predict compressive and tensile strength 23 .Based on a dataset consisting of 204 samples, the study employed the eight features outlined in Table 1.The GBDT model exhibited the most promising results among the evaluated models.
To explore the adaptability of concrete properties and compressive strength, another study adopted a combination of four models: Linear Regression, Classification and Regression Tree, ANN, and Support Vector Machine (SVM) 24 .The concrete samples were categorized based on workability, as determined by a slump test with a threshold of 12.5 cm.Linear Regression produced the most accurate predictions for lower workability slump tests, while ANN and SVM excelled in cases of higher workability.
The AdaBoost algorithm has also been applied to forecast concrete compressive strength using a composite of eight components or features alongside information about curing time.Remarkably, this approach outperformed other ML models, including ANN and SVM 7 .
To accurately estimate the compressive strength of fly-ash concrete and reduce the high variance of predictive models, the study in reference 25 compares various ensemble deep neural network models.These models include the super learner algorithm, simple averaging, weighted averaging, integrated stacking, and separate stacking ensemble models.Among these, the best results, with the highest coefficient of determination, lowest mean squared error, and the highest mean accuracy (97.6%), were achieved using separate stacking with the random forest meta-learner.
Authors of 27 emphasize the importance of establishing a precise model for the bond strength of corroded reinforced concrete, one that minimizes variance and maximizes reliability.In their study, they compared different convolution-based ensemble learning algorithms to determine which ones perform the best.To address these issues, they utilized a database compiled from previous experimental studies on the relative bond strength of corroded reinforced concrete to train convolution-based ensemble learning models.Their results indicate that the convolution-based integrated stacking model provides accurate predictions with coefficients of determination, a-20 index, and mean squared error values of 0.84, 0.75, and 0.022, respectively.
The domain of XAI techniques is becoming increasingly important in the context of predicting concrete strength.These methods assist engineers and researchers in understanding the crucial factors that impact concrete strength, enabling them to make informed decisions regarding concrete mix design and construction practices 28 .Common XAI techniques deployed in concrete strength prediction encompass feature importance analysis at the global and local levels and visual explanations.Feature importance analysis techniques, such as permutation feature importance and SHAP values, are indispensable for discerning the key factors that significantly impact concrete strength prediction.Furthermore, methods for local interpretability, such as LIME (Local Interpretable Model-agnostic Explanations), offer insights into the predictions generated for specific data points 17,29 .

Methodology
The machine learning framework for predicting concrete strength comprises five fundamental stages, as illustrated in Fig. 1.Firstly, the data collection process involves preparing a series of concrete samples under controlled conditions.This includes varying factors such as cement type, water-cement ratio, aggregate size, and curing duration-these represent the input features for the machine learning model.In the next stage, we conduct data exploration to analyze and understand the collected data.The aim is to uncover patterns, relationships, and insights crucial for effectively training machine learning models.Subsequently, the data preprocessing stage is executed to eliminate noise, address missing values, clean the data, and format it appropriately for machine learning model training.We train eight machine learning models, encompassing statistical regression, ensemble learning, SVM, and ANN, to predict concrete strength and these models are evaluated accordingly.
As shown in Table 1, prior research in concrete strength prediction primarily relies on three key approaches: statistical regression techniques, ensemble learning based on Decision Trees, and the utilization of ANNs.Our choice to employ representative machine learning models from these established approaches is rooted in assessing their efficacy in achieving precise and robust predictions.For instance, we incorporate Linear Regression as a fundamental model to serve as a benchmark for regression tasks, owing to its assumption of a linear relationship between input features and the target variable, making it a valuable initial exploration tool.LASSO Regression addresses potential multicollinearity issues within the dataset and enhances model interpretability by penalizing the absolute values of regression coefficients.Ridge Regression, another regularization method, reduces overfitting and enhances model stability by introducing a penalty term to the loss function.Decision Trees are well-known for their proficiency in capturing complex non-linear relationships and interactions among features, making them suitable for datasets characterized by intricate decision boundaries.www.nature.com/scientificreports/ We also leverage Random Forests, an ensemble method, to capture complex patterns within the data, which is advantageous for concrete strength prediction.Additionally, we consider XGBoost, an algorithm based on gradient boosting, due to its robustness in handling missing data and outliers, attributes that can greatly enhance concrete strength prediction.SVM is introduced to explore its potential to provide a distinctive perspective on concrete strength prediction, given its effectiveness in managing high-dimensional data and intricate decision boundaries.Lastly, ANNs, known for their capability to decipher intricate data patterns, are integrated into our analysis to investigate whether complex, non-linear models can outperform traditional regression models in predicting concrete strength.A brief comparison of different machine learning approaches for concrete strength prediction is presented in Table 2. Notably, the XGBoost model yields the best-reported results among all eight evaluated models.Further explanation and analysis of its corresponding results are conducted using one of the common XAI techniques, namely the SHAP method.A detailed explanation of each stage in the machine learning pipeline for concrete strength prediction will be provided in the following subsections.

Concrete's data collection
Data collection involves preparing a series of concrete samples under controlled conditions, varying factors such as cement type, water-cement ratio, aggregate size, curing duration, and other relevant parameters.The concrete specimens are then subjected to compressive strength tests using specialized equipment.The resulting data, which includes the measured compressive strengths, are recorded and used as the basis for training machine learning models.It is crucial to ensure that the data collection follows standardized testing procedures and quality control measures to maintain consistency and reliability.In this paper, the standard dataset collected by Yeh 30 is used, consisting of 1030 concrete samples with eight features: cement, water, coarse aggregate (coarse), fine aggregate (fine), superplasticizer (sp), blast-furnace slag (slag), and fly-ash (flyash).The sample was treated normally for a while before collecting the data.Then, a conventional compressive test procedure determined the concrete's compressive strength using 150 mm-tall cylindrical specimens.

Concrete's data exploration
Concrete strength data exploration involves analyzing and understanding the collected data to uncover patterns, relationships, and insights that can be utilized to train machine learning models effectively.To understand the data distribution and variability, the exploration process started with descriptive statistics, such as mean, median, standard deviation, and quartiles.Visualizations, such as histograms, box plots, and scatter plots, provide further insights by illustrating the relationships between features such as cement, water, coarse aggregate, and fine aggregate and identifying potential outliers or anomalies.

Concrete's data preprocessing
Data preprocessing is crucial in accurately predicting concrete strength using machine learning techniques.In concrete strength prediction, data preprocessing involves several important steps.Firstly, data cleaning is performed to handle missing values, outliers, and inconsistencies in the dataset.Missing values can be imputed Common techniques include normalization or standardization, which is necessary because concrete-related features often have different units and scales.For example, the compressive strength of concrete might be measured in MegaPascals (MPa), while the curing time is measured in days.If these features are not scaled, those with larger magnitudes could dominate the learning process of machine learning algorithms.Normalization or standardization ensures that all features contribute more equally to the prediction, preventing any one feature from excessively influencing the model's output.In concrete strength prediction, we often encounter categorical variables like the type of cement used or the curing method applied.Machine learning models require numerical input, so we encode these categorical variables into numerical values.One-hot encoding or label encoding helps to represent categorical data in a format that the algorithms can work with.By performing these preprocessing steps, the dataset is appropriately prepared, enhancing the effectiveness and performance of machine learning algorithms in predicting concrete strength.

Machine learning approaches for concrete strength prediction
This paper studies three machine learning approaches in concrete strength prediction: statistical, tree-based learning, and ANNs techniques (see Fig. 2).The statistical learning methods are simple predictive techniques Tree-based learning models include Decision Trees, Random Forests, and XGBoost.The third machine learning approach is based on the ANNs, which are computational models inspired by the structure and functioning of biological neural networks in the human brain.The network consists of interconnected nodes, or neurons, organized in layers.Each neuron receives inputs, performs a weighted computation, and passes the output through an activation function.During training, the network adjusts the weights associated with each connection to minimize the prediction error.Once trained, the ANN can take new inputs and provide predictions of concrete strength based on the learned patterns and relationships in the training data.The details explanation of each approach is presented in the following subsections.

Statistical machine learning approaches
The statistical learning methods are simple predictive techniques used to identify the relationship between the input and output variables and develop a mathematical function that accurately describes this relationship.Statistical learning models include linear Regression, Ridge Regression, LASSO Regression, and SVM.Linear regression as a statistical model assumes a linear relationship between one (or more) independent variables X and specific dependent variable y.This linear relationship can be expressed using a simple line equation with the perfect coefficient β (or coefficients), such that where, X = [x 0 , x 1 , ..., x n ] and β = [β 0 , β 1 , ..., β n ] represented as vectors with x 0 = 1.
With basic Linear Regression, to estimate the best set of feature coefficients β, the error between actual and predicted values for y must be decreased to the minimum 5 .This error is calculated by the least squares method such that the sum of squares (SSE) is minimized.where y i and y i are the actual and predicted values for the i th record.The minimization of SSE can be achieved with different methods 31 .
Ridge regression was introduced by Hoerl and Kennard 32 to improve prediction accuracy while dealing with highly correlated features.It shrinks the regression coefficients by adding a penalty on their size depending on the ridge coefficient.The error between the actual and predicted value is, Lambda is a complexity parameter that controls the amount of shrinkage, such as increasing lambda produces greater shrinkage.
LASSO Regression is another shrinkage method like ridge but uses the absolute values coefficients β while calculating the error function as follows: Replacing the Ridge penalty term P J=1 β 2 j by the LASSO penalty term P J=1 |β j |) makes the solutions nonlinear in the y i and there is no closed-form expression as in ridge regression.In some cases, data cannot be linearly separable, where different classes can overlap.In this case, separating classes with a single line is impossible.The main function of SVM is to form a hyperplane and a decision boundary using this plain defined by a set of support points to separate different classes in data 33 .

Tree-based machine learning approaches
Tree-based learning methods are supervised machine learning techniques that use decision trees as the fundamental building blocks for constructing predictive models.The input data is represented as a set of feature vectors, where each vector contains a set of predictor variables (features) and the corresponding target variable (the variable we want to predict).The goal is to learn a mapping between the input features and the target variable.Examples of Tree-based learning models are Decision Trees, Random Forests and XGBoost.Instead of working with the whole dataset, Decision Trees split the data among set decisions according to specific values, constructing a Tree Structure.The tree continues splitting to generalize the final decision over the whole data.A tree size parameter should be adaptively chosen from the data to control this continuous splitting.Different (1) www.nature.com/scientificreports/methodologies can be used to choose the optimal tree size, including the Sum of Squares (SSE) with a threshold or cost-complexity pruning 33 .While Decision Trees build only one tree structure for data and give the output according to it, Random Forests 34 build multiple Trees over the same dataset and collect the average output from these trees to give the final decision about specific data records.Random Forests solve the problem of overfitting that the traditional Decision Trees can produce.It also can be affected by the characteristics of data.Gradient Boosting is used with many learning methods 35,36 , but XGBoost is a scalable machine learning system used for tree boosting as an open-source package 37 .It deals with sparse data using its tree learning algorithm besides parallel and distributed computing strategies that speed up the learning process.

Artificial neural network machine learning approach
Artificial Neural network is a way to mimic human brains, with their tiny components as neurons.In computer science, a neuron is a processing unit represented as a node in a big network.This node can take input value(s) multiplied by a weight parameter, implement a definite process called activation function, and deliver the output(s) to the next neuron(s) or as a final output, as shown in Fig. 3.The topology for a simple neural network is shown in Fig. 2.Where the network consists of a set of layers, each layer has a predefined number of neurons.The main layers are the input, hidden, and output layers.In most cases, the number of neurons in the input layer is identical to the number of features in the dataset, and the output layer gives the predicted value for a specific instance in data 9 .

Concrete strength prediction model explainability
XAI techniques are becoming increasingly important in concrete strength prediction by helping engineers and researchers understand the key factors contributing to concrete strength and making informed decisions about concrete mix design and construction practices.We used the SHAP method of the XAI techniques to provide insights into the factors contributing to the concrete strength prediction problem 20 .SHAP provides a unified framework that combines game theory and machine learning to attribute the concrete strength prediction outcome to input features such as cement, water, coarse aggregate, fine aggregate, superplasticizer, blast-furnace slag, and fly-ash.By calculating SHAP values, the importance of each feature in contributing to the prediction can be quantified 38 .These values capture the additive contribution of a feature across all possible feature subsets, considering the interactions and dependencies among them.SHAP values help reveal the relative influence of features on the model's output, enabling a deeper understanding of the underlying mechanisms.In the context of AI explainability for concrete strength prediction, SHAP values identify which features play the most significant role in determining the predicted strength.This knowledge allows engineers and researchers to interpret the model's decision, validate reliability, detect biases, and gain insights into improving concrete mix designs and related factors.SHAP values provide transparency and accountability, helping to build trust in AI models and facilitating informed decision-making in concrete strength prediction and other applications.Figure 4 displays the potential features for the concrete prediction machine learning model, along with the explanation process using the SHAP method, which illustrates the feature values contributing to shifting the model's prediction output from the baseline value.The baseline, in this context, represents the average model output over the training observations on which the model was trained.Features that push the model's predictions higher are color-coded in red, while those causing predictions to decrease are represented in blue.

Concrete data collection and exploration
For training and validating machine learning algorithms, this study employs a benchmark dataset made up of 1030 concrete tests with the following eight features: cement, water, coarse aggregate (coarse), fine aggregate www.nature.com/scientificreports/(fine), superplasticizer (sp), blast-furnace slag (slag), fly-ash (flyash), and age.A sample from dataset records is presented in Table 3.
The min/max values, mean value, standard deviation (std), and quartile distribution are presented in Table 4 as a summary of these attributes and the data exploration procedure for them.Additionally, Fig. 5 displays histograms representing the statistical distribution of relevant features.The x-axis corresponds to each feature, and the y-axis indicates the frequency of occurrences.This visualization allows for a comprehensive evaluation of these features.
Notable observations include: • cement exhibits a distribution that closely resembles a normal distribution.
• blast-furnace slag (abbreviated as 'slag') displays a proper skewness and appears to follow a distribution with three peaks (Gaussians).• fly-ash (abbreviated as 'flyash') is right-skewed and appears to have a bimodal distribution with two peaks (Gaussians).• water's distribution shows three peaks with a leftward tilt.
• superplasticizer (abbreviated as 'sp') demonstrates a distribution with two peaks and proper skewness.
• coarse aggregate's (abbreviated as ' coarse') distribution is close to normal and displays three peaks (Gauss- ians).• fine aggregate's (abbreviated as 'fine') distribution appears to be bimodal with two peaks, indicating a non- normal distribution.• The age feature appears to have multiple peaks and a skewed distribution, which may be appropriate for the dataset.
Studying the correlation between features is essential for understanding the relationships between dependent features and the target strength factor, as this analysis aims to identify the optimal prediction model.Figure 6 displays a heatmap illustrating each variable's impact on all other variables.Notably, a strong correlation is observed between cement and strength, indicating that cement is a highly reliable predictor.Conversely, slag and fly-ash show weak correlations with the target variable.Additionally, it is worth highlighting the significant positive correlation between superplasticizer and fly-ash, in contrast to the comparatively weaker correlation between www.nature.com/scientificreports/superplasticizer and compressive strength.Remarkably, there is a substantial negative correlation between water and superplasticizers, as well as between water and strength.Figure 7, presented as a Pair Plot, also visually conveys the features' correlation information.This comprehensive analysis is important because dimensions showing strong correlations with values near 1 or -1 are redundant, supplying the model with duplicated information.As a result, we could decide to keep one dimension while discarding another.The choice of which dimension to retain and which to discard relies on domain expertise and an assessment of which dimension is more error-prone.
From the pair plot, it becomes evident that: • Cement exhibits complete lack of correlation with other characteristics, including slag, fly-ash, water, super- plasticizer, coarse aggregate, fine aggregate, and age.• Slag also demonstrates no correlation with the following characteristics: fly-ash, water, superplasticizer, coarse aggregate, fine aggregate, and age.• Fly-ash, aside from lacking any significant correlation with water, superplasticizer, coarse aggregate, fine aggregate, or age, does not exhibit substantial correlation with any other independent attributes.• In terms of water's relationship with other independent characteristics, a negative linear association is observed with both superplasticizer and fine aggregate.It does not show a meaningful correlation with any other attributes.It is noteworthy that superplasticizers can reduce water content in concrete by 30% without compromising workability.• Superplasticizer demonstrates a negative linear relationship solely with water and does not exhibit a strong correlation with any other variables.• Coarse aggregate, like all other attributes, does not display significant correlation with any other attributes.
• Fine aggregate, when compared to unrelated variables, exhibits a linear inverse relationship with water and does not show any meaningful correlation with other characteristics.
Figures 8 and 9 depict the scatter plots that illustrate the relationship between compressive strength as the target predicted variable and the input features variables cement, water, age, and fly-ash.Figure 8 demonstrates a positive correlation between cement content and compressive strength.As the amount of cement used increases, the compressive strength of the concrete also increases.Moreover, it has been observed that as concrete ages, its strength grows, requiring more cement to achieve greater strength at a younger age.Conversely, older cement necessitates more water, so reducing the water content in concrete enhances strength.The scatter plot in Fig. 9 reveals an inverse relationship between compressive strength and fly-ash content.The concentration of darker dots in the region corresponding to lower compressive strength values makes this relationship evident.Conversely, it has been demonstrated that the use of a superplasticizer improves compressive strength, highlighting a positive association between these two parameters.

Concrete data preprocessing
The dataset was split into a training set, consisting of 80% of the entire dataset, and a test set, comprising the remaining 20%.The training dataset was used to evaluate the model's structure and parameters.By assessing the performance of different models on the training dataset, we can determine which model has been appropriately trained.However, it is important to note that the test dataset is only employed to assess the effectiveness of the selected model after it has been chosen.Subsequently, the data underwent normalization using a standard scalar.Given that the dataset contains multiple variables, the objective was to rescale all features such that they have a mean of zero and a standard deviation of 1.This normalization process ensures that all features are on a comparable scale, preventing any individual feature from dominating the model's learning process.

Machine learning models evaluation
Eight machine learning models from three different approaches are evaluated in the presented study.These models include Linear Regression, Ridge Regression, LASSO Regression, SVM, Random Forests, Decision Trees, XGBoost, and ANN.The evaluation is performed using training and testing datasets, and various metrics are applied, such as Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error, and the coefficient of determination known as R-squared (R 2 ) score.MSE, RMSE, and MAE depend on the actual data values and the predictions made by the machine learning models, while the R-squared score is based on data variance.The following are the statistical equations that describe these evaluation metrics:   www.nature.com/scientificreports/By analyzing the SHAP values, we identified the key contributing factors and their degree of influence on the XGBoost model.The visualizations generated by SHAP, including dependency charts and summary plots, helped to uncover non-linear relationships and interactions between features, further enhancing the XGBoost model's interpretability and robustness.With the SHAP method, the XGBoost machine learning model for concrete strength prediction is explained to make its prediction results more transparent and valuable for real-world applications in the construction and engineering industry.The cumulative impact of each feature can be depicted in a waterfall chart (Fig. 13), which shows how different features affect the model's output as it changes from the base value-the average model output across the training dataset-to the final predicted value.The probability distribution of compressive strength is plotted from the bottom up, illustrating how the introduction of a specific feature changes the baseline probability from 0 to 100%.The chart uses color coding, with red indicating characteristics that enhance the forecast and blue indicating characteristics that reduce it.For instance, cement increases compressive strength by 80.28%, while water decreases it by 80.15%.The findings show that cement, age, superplasticizer, and slag considerably influence the model's prediction, driving it towards higher values.On the other hand, the water feature has little impact on improving the accuracy of the predictions.
This information is reinforced by the force map in Fig. 14 by emphasizing the features' ideal values to maximize the predicted outcome.The recommended values for the respective components are as follows: to attain a compressive strength equals 40 psi, the values of the input parameters are superplasticizer = 2.5%, age = 28 days, and slag = 0 kg/m 3 ; cement = 540 m 3 .However, it should be noted that the prediction result can be further enhanced by adjusting the values of the water, fine aggregate, and coarse aggregate properties.These characteristics are crucial in improving the prediction outcome and provide flexibility in modifying their values to obtain desired outcomes.
The selection of the most important contributing features to the model predictions is shown in Fig. 15.We guarantee a thorough analysis by determining the average SHAP value across all observations for each feature.The nullification of positive and negative numbers is avoided by averaging the absolute values.The resulting bar plot displays distinct bars for each feature, with cement having the greatest mean SHAP value, suggesting the feature's highest significance level.Notably, either positively or negatively, features with high mean SHAP values have had a considerable impact on the model's predictions.Consequently, these traits significantly impact how the model predicts outcomes.
Within the dataset, a particular variable leads to the division of data into distinct groups, offering valuable insights into population heterogeneity.As depicted in Fig. 16, our data is effectively segmented into two distinct cohorts.Through automated partitioning, 379 samples are allocated to one cohort, while the remaining 651    www.nature.com/scientificreports/samples belong to the second cohort.The optimal threshold for this division is determined to be sp = 0.85.The accompanying bar plot makes it clear that a sample is classified into the cohort where sp ≥ 0.85.This classification is characterized by higher values of age, slag, and coarse features, while exhibiting a lower value for cement.Figure 17 displays several dependency charts that highlight the conclusions drawn from our model.The dependence plot's dots indicate a single dataset prediction (row).The y-axis designates the appropriate SHAP value, and the x-axis shows the feature value obtained from the features matrix.The SHAP value denotes the degree to which the model's output for a given prediction is affected by knowing the value of a particular feature.The color mapping in the plot represents a second feature that might interact with the plotted feature.For instance, if superplasticizer interacts with cement directly, this will show in the color variations.www.nature.com/scientificreports/directions, we plan to expand our analysis by considering additional features for concrete strength prediction, including the cement fermentation period and various concrete's mechanical, physical, and chemical properties.This expansion will help to enhance the analysis and prediction capabilities of different machine learning models.

Figure 4 .
Figure 4. Feature contributions using SHAP for concrete strength prediction.

Figure 12 .
Figure 12.Concrete strength prediction machine learning model's RMSE and R 2 evaluation metrics.

Figure 13 .
Figure 13.Waterfall plot for studying the impact of different features on the model prediction results.

Figure 14 .
Figure 14.Force plot highlighting the features' optimal values to maximize the model prediction results.

Figure 15 .
Figure 15.Bar plot of identification of the most model significant features using average SHAP values.

Figure 16 .
Figure 16.Cohort plot for visualizing the distribution of two groups across different concrete strength prediction features.

Table 1 .
Machine learning models and the input features for the concrete strength prediction.

Table 2 .
Brief comparison of different machine learning approaches for the concrete strength prediction.used to identify the relationship between the input and output variables and develop a mathematical function that accurately describes this relationship.Statistical learning models include Linear Regression, Ridge Regression, LASSO Regression, and SVM.Tree-based learning methods are supervised machine learning techniques that use Decision Trees as the fundamental building blocks for constructing predictive models.The input data is represented as a set of feature vectors, where each vector contains a set of predictor variables (features) and the corresponding target variable.The goal is to learn a mapping between the input features and the target variable.

Table 3 .
Sample records for concrete strength prediction in this study.