Predicting compressive strength of RCFST columns under different loading scenarios using machine learning optimization

Accurate bearing capacity assessment under load conditions is essential for the design of concrete-filled steel tube (CFST) columns. This paper presents an optimization-based machine learning method to estimate the ultimate compressive strength of rectangular concrete-filled steel tube (RCFST) columns. A hybrid model, GS-SVR, was developed based on support vector machine regression (SVR) optimized by the grid search (GS) algorithm. The model was built based on a sample of 1003 axially loaded and 401 eccentrically loaded test data sets. The predictive performance of the proposed model is compared with two commonly used machine learning models and two design codes. The results obtained for the axial loading dataset with R2 of 0.983, MAE of 177.062, RMSE of 240.963, and MAPE of 12.209%, and for the eccentric loading dataset with R2 of 0.984, MAE of 93.234, RMSE of 124.924, and MAPE of 10.032% show that GS-SVR is the best model for predicting the compressive strength of RCFST columns under axial and eccentric loadings. It is an effective alternative method that can be used to assist and guide the design of RCFST columns to save time and cost of some laboratory experiments. Additionally, the impact of input parameters on the output was investigated.

patterns and relationships that are difficult to detect using traditional methods.Some of the machine learning methods that have been used for this purpose include artificial neural networks (ANN), gene expression programming (GEP), back-propagation neural networks (BPNN), and fuzzy logic.By leveraging these techniques, researchers have been able to successfully predict the carrying capacity of CFST, which can provide valuable insights for designing structures and reducing the need for further testing.Overall, the application of machine learning to CFST design represents an exciting and promising area of research [19][20][21][22][23][24][25][26][27][28] .In order to implement the ultimate compressive strength prediction of RCFST columns, Mai et al. 29 developed an ANN network that was optimized by the particle swarm optimization algorithm.The results revealed that the proposed hybrid model has higher prediction accuracy than the traditional design codes.The BAS-MLP model was created by Ren et al. 30 using a multilayer perceptron (MLP) neural network coupled with a beetle antenna search (BAS) algorithm to forecast the ultimate bearing capacity of RCFST columns.The outcomes demonstrated that the BAS-MLP model performs better than a number of benchmark models and traditional approaches.To forecast the maximum load capacity of short rectangular columns of restrained reinforced concrete (SCFST), Lu et al. 31 established a predictive method based on the gradient boost regression tree (GBRT) model.The results of a straightforward comparison of many regression models revealed that the GBRT model makes a fair prediction of the mechanical characteristics of SCFST columns.The ANN-PSO model was used by Kim et al. 29,32 to forecast the eccentric load capacity of 241 CCFST columns and 622 RCFST columns, respectively.The findings revealed that the average prediction errors were 12.1% and 15.4%, respectively, which is better than the traditional design codes.On the basis of 1224 test data, Panagiotis et al. 33 developed an ANN model for the ultimate compressive capacity of RCFST columns with seven variables, including the column's width and height, steel tube thickness, effective length, steel yield strength, concrete compressive strength, and eccentricity.They then compared the developed model with the design codes currently in use.It was revealed that its accuracy was greatly enhanced while keeping the forecast findings steady.Also, an explicit equation is provided for simple implementation and use evaluation.Quang et al. 34 developed a gradient tree boosting approach to forecast the strength of the CFST column, and the proposed model produced higher prediction accuracy when compared to deep learning, decision trees, random forests, and support vector machines (SVM).
Research on predicting the strength of CFST columns using machine learning seems to have made some progress.However, most studies have focused on using traditional machine learning models to forecast the axial compression strength of CFST columns.These models are limited by the selection of hyper-parameters, resulting in restricted prediction accuracy.Optimized hybrid models have the potential to improve prediction performance, but there is limited research in this area and further studies are necessary.Furthermore, current research primarily focuses on load-bearing capacity predictions, with less emphasis on the feature importance analysis of design parameters, which is particularly valuable for CFST design.To achieve this objective, this study aims to establish an optimization model for the compressive strength of RCFST under axial and eccentric loading conditions and analyze the impact of these design parameters on the output results.
As shown in Fig. 1, the input parameters consist of both geometric features and material properties.For RCFST, the specific input variables include column width (B), height (H), thickness (T), length (L), yield strength (fy), compressive strength (fc), top eccentricity (et), and bottom eccentricity (eb).The performance of the proposed optimization model was compared with that of conventional support vector regression (SVR) and random

Support vector regression model
Support vector regression (SVR) is based on the idea of structural risk minimization and is known for its good performance and predictability when dealing with situations involving small samples, nonlinearities, and large dimensions, as illustrated in Fig. 3.The basic concept behind SVR is to use nonlinearity to map the original data x to a high-dimensional feature space, where the linear regression problem can be solved.The regression function of SVR is shown below.
where w is the weight vector, b is the bias, and the following functions can be used to determine w and b.
(1) www.nature.com/scientificreports/where c is the penalty parameter, ξ i and ξ * i are slack variables, and ε is the insensitive range.There are various options for the kernel function, and the typical RBF kernel function is utilized in this study.The calculation process of SVR can be represented by the flow chart in Fig. 4.
For regression modeling, the interplay between two hyper-parameters (c and g) has the greatest impact on model accuracy 36 .To address this issue, the grid search (GS) method is introduced, which is widely adopted due to its ease of use and simplicity 37 .www.nature.com/scientificreports/

Support vector regression with grid search optimization
GS's fundamental tenet is to first define the parameter area to be searched, split the region into a grid, and then examine all possible parameter combinations at each intersection point in the grid.All of the grid's intersections represent parameter combinations (c, g) that must be searched, and all of the hyperparameter combinations must be taken.Cross-validation is used to verify the prediction accuracy related to each set of data in order to get the best (c,g).The sets of (c,g) with the best accuracy are chosen as the model's core components.The basic steps of grid parameter optimization search are as follows.
(1) Establish the coordinate grid: take Based on the results of parameter optimization and cross-validation, the best combination of hyper-parameters values is selected to make the system perform best, and the test dataset prediction is implemented using SVR model with optimal parameters.The framework of this paper is shown in Fig. 5.

Dataset description
To construct a precise strength model for the CCFST column, a comprehensive experimental database is essential.Two datasets, comprising 1003 tests on RCFST columns under axial loading (Dataset 1) and 401 tests on RCFST columns under eccentric loading (Dataset 2), were collected from an open-public dataset 38 .A description of the experimental conditions and more detailed experimental situations for each sample in the data set can be found in Reference 39 and will not be repeated here.The ranges and statistical characteristics of these datasets are illustrated in Fig. 6 and Table 1, respectively.It is noteworthy that the distribution of maximum bearing capacity exhibits significant variations, which may pose a challenge to accurately predict the outcomes.Also, it can be observed from Fig. 7 that Pearson linear correlations were computed and plotted between the input and output variables in the two data sets.It can be seen from Fig. 7 that the three variables with the strongest linear correlation with the compressive strength of RCFST are B, H, and T, which are all geometric properties.The correlation coefficients of these three variables were 0.65, 0.56, and 0.56 for dataset 1 and 0.55, 0.60, and 0.50 for dataset 2, respectively.And the three of them are positively correlated with the compressive strength, while the L is negatively correlated with the compressive strength.Among the multiple variables listed in this paper, all the parameters except L, et, and eb are positive for the bearing capacity of RCFST columns, and N increases as these parameters increase.However, the correlation coefficient between input and output variables did not exceed 0.8, indicating that complex nonlinear correlations need to be established between multiple input factors and the output compressive strength to achieve an accurate prediction of compressive strength.
Additionally, the following four metrics were used to evaluate the model's performance: correlation coefficient (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).Their definitions are depicted below 40,41 .
where T and Y are the experimental and predicted results, respectively, while T and Y are the mean values.

Optimization of optimal hyper-parameter combination
The training set and the data set were chosen randomly for each case, with a ratio of 80%:20% between the two.The cross-validation and GS method were used to explore the optimal hyper-parameter combination.The evolution of the mean square error (MSE) and the determination of the optimal parameters during the training in the search range [2][3][4][5]25 are shown in Fig. 8. Fordataset 1, the best validation performance of the model is achieved when c = 2, g = 0.87055, and for dataset 2, when c = 6.9644, g = 0.5.Then, these two sets of hyper-parameter combinations will be used for the model building of the two data sets respectively.

Model prediction outcomes comparison
Random forest and the original SVR model were also utilized on the same training and test sets for comparison to evaluate the validity and reliability of the proposed models.Figure 9 shows the relationship between the experimental data under various scenarios and the forecasted results of the three models.As seen, for both the training and test sets, the scatter between the three machine learning models' outcomes and actual values is primarily within ± 20%.Unfortunately, Fig. 9 makes it challenging to compare the three models.The error metrics between the predicted outcomes and the actual values of the various models are listed in Table 2 for easy comparison.
The correlations between the predicted and actual values in the hybrid model proposed in this research are 0.983 and 0.984 for two different datasets, respectively, which are higher than those in the two standard machine learning models, RF and SVR.Among the three models, the other three error indicators are the lowest.The results obtained for the axial loading dataset with R 2 of 0.983, MAE of 177.062,RMSE of 240.963, and MAPE of 12.209%, and for the eccentric loading dataset with R 2 of 0.984, MAE of 93.234, RMSE of 124.924, and MAPE of 10.032% show that GS-SVR is the best model for predicting the compressive strength of RCFST columns under axial and eccentric loadings.
Figure 10 offers a comprehensive overview of the prediction error distribution among the models in the test dataset.The findings reveal that, across all three machine learning models, approximately 50% of the test sets exhibit a relative prediction error of 10% or less, while 80% of the test sets display a relative error distribution   error of 12.209% and 10.032% for the test set under the two different working conditions, respectively.These average relative errors are notably smaller than those of the corresponding SVR and random forest models, with all relative errors falling within the 15% threshold, meeting the requirements for engineering applications.
To further evaluate the performance of the proposed model, two design criteria, AISC 360-16 and Eurocode 4 (EC4), were used to make predictions on the test set and the ratio between the experimental and predicted values of the different models was calculated as shown in Fig. 12. From the mean values μ presented in Fig. 12, the ratio between the actual and predicted values in the GS-SVR model is closer to 1, indicating that the predictions are more accurate.

Input feature analysis
In addition to accurate load-bearing capacity predictions, the analysis of the importance of design parameters is also a critical step in the design of RCFST columns.This is because adjusting design parameters in order of importance, from high to low, can save time and costs.This section introduces Shap analysis to discuss the impact of various parameters on the output results, as shown in Fig. 13.The factors that have the greatest impact on the load-bearing capacity of the column are, in descending order of importance, H, followed by B, and then fy, T, fc, and L. The eccentricities et and eb have the least impact.Additionally, Fig. 13 also demonstrates whether these impacts are positive or negative.It can be observed that the top five parameters in terms of importance have a positive impact on compressive strength, while length and eccentricity have a negative impact.These influences are extremely helpful in the design of RCFST columns.Designers can adjust the design values of various parameters based on the impact of these design parameters to achieve the desired design objectives.

Conclusions
This study proposes an optimal hybrid model to accurately predict the strength of RCFST columns under both axial and eccentric loads, shedding light on the complex mechanical behavior of RCFST.The proposed model considers the intricate interactions between geometry, material properties, and compressive strength for various loading scenarios.For two different test sets, the suggested hybrid model exhibits average relative prediction errors of 12.209% and 10.032%, respectively.These errors are smaller than those of the traditional SVR and random forest models, and all relative errors are under 15%, indicating a high degree of prediction accuracy.Moreover, the proposed hybrid model has certain superiorities over the traditional design codes.Therefore, the optimal hybrid model can serve as a reliable alternative to commonly used design codes for predicting the compressive strength of RCFST columns, which can partially replace laboratory tests to save resources and assist in the design of RCFST columns.
Among the input parameters listed in this paper, the cross-sectional dimensions of the steel tube concrete are the most influential on its compressive strength.In the design of concrete-filled steel tube columns, attention should be given to the width and height of the RCFST column.Parameters et, eb, and L have a negative effect on compressive strength, while other geometric parameters and material properties lead to an increase in compressive strength with an increase in their design values.
The implementation of the proposed model in this paper is on a specific dataset.The applicability and generalizability to other similar datasets need to be further investigated.Also, taking more factors affecting the bearing capacity into account as variables within the model is a focus for future work.

Figure 1 .
Figure 1.Schematic diagram of RCFST columns under axial and eccentric loading.

Figure 3 .
Figure 3.The schematic diagram of SVR.
x = [−a, a],y = [−b, b], step size L, and take the grid points of parameters as c = 2 x , g = 2 y .(2) Use k-fold cross-validation to find the regression accuracy: select the training data and divide them into k copies that are uniformly disjoint, select k − 1 of them for model building, and leave the remaining one for validating the model.A set (c,g) in the parameter grid is selected and the prediction accuracy of the test data corresponding to this set (c,g) is recorded.Repeat the preceding processes k times to get k models, then run each model on a different set of test data to get k prediction accuracies.Finally, take the average of these accuracies to get the final corresponding accuracy of the group of parameters.(3) Iterate the coordinate grid: find the final accuracy of all parameter combinations and rank them from largest to smallest, and select the top group as the final (c,g) combination of the model.

Figure 5 .
Figure 5.The framework of this paper.

Figure 6 .
Figure 6.The range distribution of all variables.

Figure 8 .
Figure 8. Optimal hyper-parameter combination search using grid search and cross-validation.

Figure 9 .
Figure 9. Correlation between expected results and actual values.

2 Figure 12 .
Figure 12.Ratio of experimental values to predicted values for different models.

Figure 13 .
Figure 13.SHAP feature importance and summary plot for RCFST column under eccentric loading.

Table 1 .
The data set's statistical findings.

Table 2 .
Evaluation indicators for the three models' predictions.Figure 10.Prediction error distribution of the test set.