Study of flexural strength of concrete containing mineral admixtures based on machine learning

In this paper, the prediction of flexural strength was investigated using machine learning methods for concrete containing supplementary cementitious materials such as silica fume. First, based on a database of suitable characteristic parameters, the flexural strength prediction was carried out using linear (LR) model, random forest (RF) model, and extreme gradient boosting (XGB) model. Subsequently, the influence of each input parameter on the flexural strength was analyzed using the SHAP model based on the optimal prediction model. The results showed that LR, RF, and XGB enhanced the accuracy of forecasting sequentially. Among the characteristic parameters, the most significant effect on the flexural strength of concrete is the water-binder ratio, and the water-binder ratio shows a negative correlation with flexural strength. The effect of maintenance age on flexural strength is second only to the water-binder ratio, and it shows a positive trend. When the amount of fly ash is less than 40% and the amount of slag or silica fume is less than 30%, the correlation between the amount of supplementary cementitious materials and flexural strength fluctuates and a positive peak in flexural strength is observed. However, at a dosage greater than the above, the supplementary cementitious materials all reduce flexural strength. The interaction interval and the degree of interaction between the supplementary cementitious materials and the cement content also differ in predicting flexural strength.


Study of flexural strength of concrete containing mineral admixtures based on machine learning
Yue Li 1 , Yunze Liu 1* , Hui Lin 1 & Caiyun Jin 2 In this paper, the prediction of flexural strength was investigated using machine learning methods for concrete containing supplementary cementitious materials such as silica fume.First, based on a database of suitable characteristic parameters, the flexural strength prediction was carried out using linear (LR) model, random forest (RF) model, and extreme gradient boosting (XGB) model.Subsequently, the influence of each input parameter on the flexural strength was analyzed using the SHAP model based on the optimal prediction model.The results showed that LR, RF, and XGB enhanced the accuracy of forecasting sequentially.Among the characteristic parameters, the most significant effect on the flexural strength of concrete is the water-binder ratio, and the water-binder ratio shows a negative correlation with flexural strength.The effect of maintenance age on flexural strength is second only to the water-binder ratio, and it shows a positive trend.When the amount of fly ash is less than 40% and the amount of slag or silica fume is less than 30%, the correlation between the amount of supplementary cementitious materials and flexural strength fluctuates and a positive peak in flexural strength is observed.However, at a dosage greater than the above, the supplementary cementitious materials all reduce flexural strength.The interaction interval and the degree of interaction between the supplementary cementitious materials and the cement content also differ in predicting flexural strength.
The mechanical properties of concrete have been of great interest 1,2 .Among the many mechanical properties, strength is one of the most basic indicators.The study of concrete strength can be traced back to the beginning of the invention of concrete materials.Usually, the strength of concrete includes compressive strength and flexural strength, etc. 3,4 .For some structural members with special force and support requirements, the flexural strength of concrete must meet the requirements, the in-depth study of which is extremely important 5,6 .
During the study of concrete flexural strength, specimens were usually tested on the basis of changing the raw material ratio, curing conditions, and the curing age 5,7,8 .These studies have greatly enriched the understanding of the factors influencing the flexural strength.Among the many factors affecting the flexural strength of concrete, the composition of raw materials is one of the most basic and important factors 9,10 .Among the raw materials, the connection between cement and strength is more complex and important compared to other raw materials such as aggregates because the hydration products of cement are directly related to strength 11,12 .However, the massive use of cement will produce more greenhouse gases such as carbon dioxide, which aggravates environmental pollution and contradicts the current promotion of energy saving and carbon neutrality 13 .This has led to a gradual shift of attention to the development of cementitious materials with a certain cementing capacity and low carbon emissions.Compared to cement, supplementary cementitious materials such as fly ash, slag, silica fume, and metakaolin are more environmentally friendly both in preparation and use [14][15][16][17] .This makes it important to study in depth the effect of supplementary cementitious materials on the mechanical properties (e.g., strength) of concrete.Currently, there have been many studies on the strength of concrete containing supplementary cementitious materials, as summarized in Table 1.
The above studies have provided a significant boost to the related field and have led to a better understanding of it.However, most of these studies have been carried out by traditional experimental means, i.e., by controlling www.nature.com/scientificreports/specific ages, specific water-binder ratios, etc., and obtaining specific values using a press machine.This will make the concrete containing supplementary cementitious materials in some projects with large volume and very high strength characteristics requirements, which will increase the experimental labor and material as well as time costs.Moreover, due to the influence of multiple factors, the final test results may be different from the actual engineering application.Therefore, the search for a concrete strength research method that can integrate multiple strength influencing factors with high efficiency and accuracy is on the agenda.With the development of computer science, some researchers have introduced machine learning into the study of concrete strength [30][31][32][33][34] .For example, Nguyen et al. 35 studied the compressive strength of concrete by machine learning model.Zeng et al. 36 used deep learning theory to predict achieved accurate prediction of compressive strength of concrete.However, most of the aforementioned studies have focused on concrete compressive strength.Few studies have been conducted on the flexural strength of concrete using machine learning, and no studies have been reported on the flexural strength of concrete containing supplementary cementitious materials.Therefore, there is a need to conduct scientific machine learning modeling studies for the flexural strength of concrete containing supplementary cementitious materials to promote the development of flexural strength of low-carbon concrete.
In view of the above, this study selected fly ash, slag, and silica fume, which are three widely used supplementary cementing materials at present [37][38][39][40] .The existing studies were synthesized and a database covering the appropriate feature parameters was established.Based on this database, modeling studies were conducted using the linear (LR) model, random forest (RF) model, and extreme gradient boosting (XGB) model.The purpose of the above machine learning model selection is first to verify whether there is a linear relationship between various feature parameters, and then select two integrated machine learning models that are currently widely used for in-depth analysis.On the basis of the optimization model, the characteristic features were investigated based on the Shapley additive interpretation (SHAP).

Characteristic parameter establishment
There are numerous elements affecting the flexural strength of concrete (F_S), including water-binder ratio, aggregate content, etc.For concrete containing supplementary cementitious materials such as fly ash, slag, and silica fume, the admixture of supplementary cementitious materials is also an important influencing factor.Therefore, this study summarizes the factors influencing the strength of concrete with supplementary cementitious materials according to the possible differences in the process of making and maintaining concrete.First, in the preparation stage of concrete with supplementary cementitious materials, cement as the main cementitious material, the influence of cement content (cem_con) on the flexural strength of concrete cannot be ignored.Taking the cement admixture as a benchmark, the admixture of fly ash (FA_con), slag (S_con), and silica fume (SF_con) as alternative cement materials in the concrete is also the focus of this study.As an important participant in the hydration process of cementitious materials that cannot be ignored, the water content has an important effect on the strength.Therefore, the water-binder ratio (w_b) is also a characteristic focused on the study.Aggregate, as the component with the largest proportion of the three-phase composition of concrete, also has an important influence on the flexural strength of concrete, so fine aggregate content (Fine_A_con) and coarse aggregate content (Coarse_A_con) are set as characteristic parameters as well.During the curing of concrete, the hydration process of the cementitious material changes with the increase of curing time, so the curing age (Age) also has an important influence on the flexural strength.It is worth noting that the curing temperature and the curing humidity also affect the hydration of the cementitious materials during the curing of concrete specimens, which in turn affects the flexural strength of concrete, but these two are not considered characteristic parameters in this study.The curing temperatures of the selected concrete specimens were in the range of 20 ± 2 °C and the relative humidity fluctuated in the range of 60 ± 5%.

Determination of the database
When using machine learning methods for predictive modeling of material properties, a database covering all feature parameters with a sufficient amount of data is required.A good database contributes to the successful implementation of machine learning modeling and helps to improve the accuracy of subsequent feature parameter analysis.In this study, eight sets of input parameters (w_b, cem_con, SF_con, S_con, FA_con, Fine_A_con, Coarse_A_con, Age) and one set of output parameters (F_S) were set according to the influencing factors given in "Characteristic parameter establishment" section.Thirty-one papers related to the flexural strength of concrete containing supplementary cementitious materials were read and summarized [18][19][20][21][23][24][25][26][27][28][29][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60] , and 373 valid data sets were compiled. The data onflexural strength of 373 groups of concrete were counted, and the mathematical characteristics of the input and output parameters were obtained as shown in Table 2.It can be seen that the data distribution interval corresponding to each characteristic parameter is relatively wide, and the amount of data covered is sufficient to avoid the adverse effects of data concentration on the model prediction results to a certain extent.
The requirements for the database are to ensure that, in addition to the quantity and quality of the dataset, there are no significant linear relationships between the feature parameters.When the linear relationship between the feature parameters is strong, it will affect the model prediction accuracy.In order to characterize the linear relationship between different feature parameters, statistical analysis is performed using Pearson correlation coefficient (PCC) in Eq. ( 1).The i is data sample; m is the number of i; y i , y i ′ are the true and predicted values; y i , y i ′ are the average values of the above.Usually, the correlation between the two data sets can be considered negligible for the exactness of the model predicted outcome when the PCC value is between -0.4 and 0.4.
The data hot map of PCC values for all input parameters and output parameters is shown in Fig. 1.From the hot map, it can be seen that the vast majority of the input parameters exhibit weak correlation with each other, except for individual input parameters whose PCC values are not between -0.4 and 0.4.This indicates that the data in the database used in this study are superior.Comparing the correlations between the input and output parameters, it was found that only the water-binder ratio had a significant correlation with the flexural strength of concrete, while the remaining seven groups of input parameters showed a low correlation with the flexural strength of concrete.

LR model
Many machine learning models have been successfully validated in the study of performance prediction of concrete materials.The most basic model is the LR model, which has a simple modeling process, simple analytical ideas, and fast computation speed, and was widely used before the integrated machine learning system was proposed.The calculation principle of the LR model is shown in Fig. 2.

RF model
Integrated algorithmic models, as emerging machine learning models, possess more sophisticated computational procedures that can significantly improve prediction accuracy compared to linear model systems.Machine learning is accomplished by constructing multiple learners that adopt specific strategies.The purpose of the integrated algorithm is to construct a strong learning machine from multiple weak learners.This leads to a better analysis of various parameter features 17 .
The base learner chosen for modeling the flexural strength of concrete through the RF model is the decision tree.Autonomous aggregation algorithm (bootstrap aggregation) for learners 61 .When using the bagging method for calculations.During processing, strength learners are obtained by processing many weaker learners.Subsequently, the average value is processed.
(1) www.nature.com/scientificreports/This model is an extended extension of the bag method in which the schematic diagram is presented in Fig. 3.As shown in the figure, the database of the RF model is divided into multiple branches.After considering the feature performance of each branch, the results of the altered model are obtained by averaging 61 .

XGB model
Among centralized machine learning systems, the XGB model is a rapidly developing and widely used efficient learning model because it's fast, flexible and lightweight.The XGB model was also utilized as a decision tree to serve as the basic collector for machine learning.The integrated learning method used in the XGB model is the Boosting 35 .Figure 4 shows the schematic diagram of the method.
In the computational processing using the XGB model, the output parameters are obtained from a base learner consisting of V number of tree structures (f 1 , f 2 , …, f V ), as shown in Eq. ( 2).where ∅ denotes the regularization.To obtain the minimized objective function at u (Eq.( 4)), it is necessary to use the Taylor approximation.
The expressions for L i and P i in the above equation are shown in Eq. ( 5) and Eq. ( 6), respectively.
In the XGB model, when the number of leaves of f u is assumed to be V and r w denotes the weight of the wth leaf, the regularization formula has the following expression: (2)  where λ and γ are constants.Finally, when the form of the tree structure is fixed, e.g., the optimal prediction corresponding to the tree structure d can be obtained by Eq. ( 8).
Equation (8), also known as the quality evaluation function, can be used to evaluate the optimal weight results of the base learner with a number of leaves.More explanation of the XGB model can be found in the introduction given in the literature 62 .

Evaluation indicators
Different machine learning models have different computational procedures, which leads to differences in the prediction results when using different models to predict the same properties.In order to scientifically judge the prediction effectiveness of the models in predicting the flexural strength, five metrics commonly used in mathematical statistics were used.The first is the coefficient of determination, the formula for R 2 given in Eq. ( 9).In mathematical statistics, R 2 is one of the most commonly judged indicators.By normalizing the residuals of the data, a coefficient of determination value with a size between 0 and 1 can be generated, and the closer the value of the coefficient of determination is to 1, the better the model's prediction is indicated.Equation ( 10) denotes the mean absolute error (MAE).Equation (11) and Eq. ( 12) denote the mean squared error (MSE) and root mean squared error (RMSE), both of which can be used to judge the excellence of the model by evaluating the magnitude of the change in data development during the prediction process.Also, PCC is an important criterion for analyzing linear relationships.The smaller the calculated values for MAE, MSE and RMSE, the more reliable the model is.The closer the value of PCC is to 1, the more reliable the model is.

Flow chart
In the prediction of flexural strength, this study started with a comprehensive analysis of the factors influencing flexural strength.In addition to the non-essential factors and common conditions such as temperature and humidity, eight input parameters including w_b and cem_con, and the output parameter F_S were identified.Subsequently, the available test results of flexural strength of concrete containing supplementary cementitious materials were statistically analyzed and a database containing 373 sets of data was established.Subsequently, the linear relationship between the characteristic parameters was analyzed based on the PCC calculation results to ensure the reliability of the database established in this study.
When using machine learning models for flexural strength prediction to determine the most suitable machine learning model.In this study, three models were selected.Out of the total data, seventy percent is training data and thirty percent is test data.After parameter optimization of each model, five mathematical and statistical evaluation metrics such as R 2 and MAE were used to evaluate the prediction effectiveness of various models with respect to the model calculation results.
Finally, in order to determine the effect of each feature parameter, this study illustrates the correlation of the characteristic parameters based on the SHAP model.Furthermore, the effect of the three supplementary cementitious materials on the flexural strength of concrete when they are combined with cement is investigated in turn for each of the three supplementary cementitious materials.The detailed technical route process is shown in Fig. 5.

Parameter optimization
Parameter optimization is an essential step in machine learning and can be a good way to improve the prediction accuracy of machine learning.Parameter processing using PYTHON.The processing is carried out by grid search and ten-fold cross-validation.Grid search is performed through the original parameters in the model to seek the optimization direction.The optimal result is obtained after extensive processing.Ten-fold cross-validation was Vol.:(0123456789) www.nature.com/scientificreports/used in analyzing the calculation and its calculation schematic is shown in Fig. 6.There are many reports about the method, which will not be detailed here.
When performing parameter optimization, different models will have different optimization objectives.Among them, in the three machine learning models used in this study, the objects of their main optimization parameters are shown in Table 3.As can be seen from the table, two identical optimization objectives exist for the two models and two different optimization objectives also exist.The final optimization results obtained by the parameter optimization method described earlier are shown in the table.

Predicted results
The prediction results of the three machine learnings used in this study in predicting the flexural strength of concrete are presented in Table 4.The LR model has the smallest value of R 2 .Furthermore, the LR model had the smallest PCC value and the rest of the metrics were the largest.This indicates that the LR model has an overfitting problem and has the worst prediction accuracy.The two integrated learning models gave better predictions.In terms of R 2 values, both the two models are larger and more similar.The variation in R 2 between the two datasets was small, indicating that the models are not overfitted in the fitting process.In addition, for the XGB model, the FR model has larger values for all metrics except for the PCC value.In response to the above mathematical and statistical results, it can be concluded that the XGB model is most effective in predicting the flexural strength of concrete containing supplementary cementitious materials.
In order to more intuitively characterize the effectiveness of the three machine learning models in predicting the flexural strength of concrete, the prediction results of the training and test sets are pooled in Fig. 7 in this study in the form of images.As can be seen from Fig. 7a, when using the LR model for prediction, the test set shows a characteristic of scattered distribution of data points around the equivalence points.Figure 7b shows the prediction results of the RF model, and it can be seen that the data points in the RF model are more concentrated  than those in the LR model, with most of the data points on both sides of the equivalence point, and only a small number of data points are distributed in the error interval of more than 20% prediction results.It is clear from Fig. 7 that the XGB model has excellent results in predicting the flexural strength of concrete with supplementary cementitious materials.www.nature.com/scientificreports/

Computational thinking
The analytical model used in analyzing the effect of various parameters on flexural strength is SHAP 63 .The model has its origin in game theory and effects of all the parameters are considered together in the analysis.In a particular prediction data, the model yields a prediction that can be labeled as SHAP value.The principle of this value is shown in Eq. (13): f(x i ) is the prediction result i; u 0 is the model baseline; x ij denotes the j parameter for sample i; g(x ij ) is the devote in f(x i ).f(x i ) has the uniqueness 17 .In PYTHON, the calculation is analyzed by SHAP.Based on the data in the database, the feature values are obtained using the XGB model.The SHAP value can characterize both the parameter with the highest degree of influence when analyzing the influence of a parameter and the influence process within the parameter range.In this case, the impact is positively correlated for SHAPM values greater than 0 and negatively correlated for values less than 0.

Analysis of impact elements
Based on the previous comparison of the computational results of the three machine learning models, it was demonstrated that the XGB model performed optimally in predicting the flexural strength of concrete.Therefore, in this study, the SHAP model was used to analyze the effect of various characteristic factors on the flexural strength of concrete based on the XGB model.
The gradient of the strength influenced by the eight input parameters affecting the flexural strength of concrete was first determined and characterized using SHAP mean values, and the results are shown in Fig. 8.It can be seen that for the flexural strength of concrete containing supplementary cementitious materials, the level of impact of the eight characteristic factors, in order from strong to weak, is: w_b, Age, FA_con, cem_con, Coarse_A_con, Fine_A_con, SF_con, S_con.
To further analyze the effects of various input parameters on the flexural strength of concrete and to provide insight into the trend of their effects on flexural strength from the numerical changes of the input parameters, the SHAP values of the input parameters shown in Fig. 9 were used for global interpretation in this study.The influence of each characteristic parameter on the flexural strength decreases in order.In the SHAP value, the relationship between the characteristic parameters and flexural strength can be assumed that there are two forms of correlation, one positive, i.e., the color red.One is a negative correlation, i.e., blue.
Combining Figs. 8 and 9, it can be found that w_b has the strongest relationship with the flexural strength of concrete containing supplementary cementitious materials.Moreover, as the SHAP value of w_b increases, w_b shows a negative trend of correlation with flexural strength.For Age and SF_con, the larger the SHAP value of both, the color of the data points gradually appears blue, which indicates that the above two input parameters show a strong positive correlation trend with the flexural strength.The remaining input parameters, compared to the above three input parameters, do not show a clear trend in the SHAP value global plot in relation to the flexural strength.Therefore, in order to fully interpret the relationship between the specific numerical variation of each input parameter and the flexural strength of concrete, this research will be analyzed and explained in detail.

Influence of various characteristic parameters
The relation of the effect of various parameters on the strength is illustrated in Fig. 10.The calculation results show that various characteristic parameters affect the SHAP value, but the degree of influence varies.First, there is the most significant effect on the flexural strength, w_b, as shown in Fig. 10a.From Fig. 10a, the SHAP value is gradually decreasing as w_b increases in the database of this study.This indicates that the strength is gradually decreasing with w_b.Moreover, the strength fluctuates more significantly when the w_b is between 0.3 and 0.35.In the images of the remaining characteristic parameters, the relationship between the data values of Age and SHAP value shows a completely opposite trend to w_b.That is, as Age increases, its SHAP value shows a clear upward trend.This indicates that strength increases with increasing age of concrete curing.As for the effect of cement admixture, it can be seen from Fig. 10d that when its content is between 200 and 600 kg/m 3 , (13) f (x i ) = u 0 + g(x i1 ) + g(x i2 ) + . . .+ g x ij  3 .This indicates that the strength will have a peak value at a cement admixture of 370 kg/m 3 .However, since in this study, there is cement along with supplementary cementitious materials, the effect of cement needs to be analyzed together with the three supplementary cementitious materials.The SHAP values of the two aggregates show similar trends with the aggregate content.Analyzing Fig. 10e,f, it can be concluded that the increase in aggregate makes the flexural strength of concrete increase first and then decrease.
Then the effect of the three supplementary cementitious materials on the flexural strength of concrete is analyzed, and it can be found from the figure that FA_con and SF_con show a similar trend to w_b affecting the flexural strength of concrete on some intervals.This indicates that in terms of strength, the above two characteristic parameters have less flexural strength when they are present in concrete in larger amounts.However, in marked contrast to w_b, the fluctuation of the SHAP value of FA_con at less than 40% content is not obvious, and even up to 20% shows an increasing trend.This indicates that the flexural strength of concrete peaks at around 20% of FA_con content.For SF_con, its SHAP value appears to show fluctuations below 30%.and a smaller peak occurs around 10% and 30%, respectively.Subsequently, when the content of S_con exceeds 30%, the value decline.Therefore, combining the numerical changes of the SHAP value of S_con in several content zones, it can be concluded that the effect of S_con on the flexural strength of concrete may have peaks at 10% and 30% when the content is small.However, when the content of S_con exceeds 30%, the flexural strength decreases gradually.As for the characteristic parameter SF_con, when its content is below 18%, the value presents a fluctuating growth tendency.When the value of SF_con exceeds 18%, the trend is downward.The above shows that the maximum value of flexural strength of concrete may occur when the amount of silica fume in the cementitious material is 20%.
In order to analyze in depth the effect of composite cementitious materials on the flexural strength of concrete when three supplementary cementitious materials act together with cement, and to clarify the interval of different supplementary cementitious materials acting on cement content.In this study, the SHAP values of the three supplementary cementitious materials were analyzed in combination with the SHAP values of the cement, the interactions were analyzed as shown in Fig. 11. Figure 11b presents the interaction effect of fly ash and cement for flexural strength.The red data points are more extensive and the data points at the maximum value of SHAP value of Cem_con also show red color.This indicates that the FA and cement are closely related, and the effect of FA on the flexural strength of concrete is greater at the optimum cement admixture.The relationship of the slag with the cement is shown in Fig. 11c.The red data points are mainly concentrated between the cement content of 200 kg/m 3 and 400 kg/m 3 , indicating that the slag and cement interactive intervals are less than FA, and there is a certain effect on the flexural strength of concrete in this interval.In Fig. 11d, the distribution interval of red data points is more extensive, but mainly concentrated in the interval of cement less than 300 kg/ m 3 .This indicates that the interaction interval between silica fume and cement is mainly concentrated in the interval when the cement content is small, and silica fume will have an important effect on the flexural strength of concrete in this interval.

Conclusions
For the flexural strength of concrete containing supplementary cementitious materials, prediction studies were carried out using three machine learning models.The parameters were analyzed based on SHAP model, with the following main conclusions: • The LR model is the poorest predictor of flexural strength for concrete with supplementary cementitious materials.The XGB model has a higher prediction accuracy than the RF model.The XGB model is recommended for predicting the flexural strength of concrete with cementitious admixtures.• Among the characteristic parameters that affect the flexural strength of concrete with supplementary cementi- tious materials, the highest degree of correlation with the predicted results is the water-binder ratio, followed by the curing age.The weakest correlation with the predicted results was the admixture of silica fume and slag.• Water-cement ratio and age of curing have the greatest influence on flexural strength.The water-cement ratio is negatively correlated with flexural strength, while the age of maintenance is positively correlated.The relationship between aggregate content and flexural strength showed a positive correlation followed by a negative correlation.The relationship between flexural strength and cement content was similar to that of aggregate content.The effect of the three supplementary cementitious materials on the flexural strength of concrete can be summarized as the correlation with flexural strength fluctuates at small admixtures (FA_con < 40%, SF_con, and S_con < 30%) and there is a positive correlation with the optimum admixture.However, the correlation becomes significantly negative at dosing levels greater than the above.• In predicting the flexural strength of concrete, different supplementary cementitious materials interact with cement in different admixture intervals.Fly ash can act in the whole blending interval of cement.Slag has an important role between cement h content of 200 kg/m 3 and 400 kg/m 3 .Silica fume has an important role in low cement content.

Figure 1 .
Figure 1.Relationship diagram of each parameter.

Figure 5 .
Figure 5. Sketch of the whole process of analyzing and processing in this study.

Figure 7 .
Figure 7. Machine learning prediction results for flexural strength of concrete.

Figure 8 .
Figure 8.Average SHAP value of each characteristic parameter for concrete flexural strength prediction.

Figure 9 .
Figure 9. Analysis of the importance of various factors in the prediction of flexural strength of concrete using SHAP model characterization.

Figure 10 .
Figure 10.Influence of various parameters on flexural strength and the process of impacting them.

Table 1 .
Summary of the study on the strength of concrete containing supplementary cementitious materials.

Table 2 .
Statistical results of parameter characteristics.

Table 3 .
Optimization results of numerical indicators.

Table 4 .
Comparative results of model prediction effects.