Introduction

Cement-based composites are widely used in civil infrastructure due to their easy shapeability, strong compressive strength (CS), and affordability1. However, low tensile strength, brittleness, and vulnerability to harsh environments are significant concerns, leading to durability issues and high maintenance costs2. As nanotechnology (NT) advances, several attempts have been made to incorporate nanomaterials into concrete to enhance its strength and functional properties. Various nanomaterials, such as graphene nanoplatelets (GrN), nano-titanium dioxide, carbon-nanofiber (CNF), nano-silica, nano-clay, and carbon-nanotube (CNT), have been incorporated into cement composites to enhance durability, strength, and functionality3,4,5. Nanomaterials (NMs) are commonly employed to improve concrete performance by harnessing physical processes such as nucleation and filling6,7,8. In 2004, Novoselov et al.9 unveiled a groundbreaking two-dimensional carbon-based material called graphene, acquired through a method of exfoliation from bulk graphite. Due to its exceptional physical characteristics, including Young's modulus of 1.1 TPa and tensile strength of 130 GPa, graphene is widely regarded as the most promising carbon-based nanomaterial for diverse applications. Moreover, it boasts remarkable intrinsic features such as extraordinary thermal conductivity, a large specific surface area, and high electronic mobility10,11,12,13. In recent years, graphene has emerged as a standout among carbon-based nanomaterials, offering the potential to improve the performance of cement composites in multiple ways when integrated into the cement matrix1. The increased focus on integrating graphene into cementitious composites stems from several factors14,15. These include enhancing concrete's resistance to harsh environments, reducing the required amount of cement to achieve similar strength properties, preventing thermal cracking, providing fire resistance, offering excellent electrical properties for effective sensing in smart civil infrastructure, demonstrating exceptional electromagnetic interference shielding to protect humans from electromagnetic emission issues, and enabling mass production at relatively low manufacturing costs while preserving its pristine characteristics16,17,18,19,20,21,22,23. The multifunctionality of graphene-reinforced cementitious composites is illustrated in Fig. 123,24,25.

Figure 1
figure 1

Illustration of the multifunctionality of graphene-reinforced cementitious composites1.

In recent years, there have been continuous reports of significant improvements in the CS of concrete reinforced with graphene. The enhancement in the CS of graphene nanoplatelets-reinforced cementitious composites (GrNCC) can mainly be attributed to three factors: the nano-filler effect leading to a more compacted and refined cement matrix, the bridging effect of graphene effectively preventing and constraining the propagation of microcracks, and the large specific surface area of graphene promoting cement hydration reactions25,26,27. Wang et al.28 found that the flexural strength of cement paste increased by 15–24% with the addition of 0.05 wt% GrN (by weight of cement). Additionally, the CS of the GrN-cement composite increased by 3–8%. Similarly, Liu et al.29 reported a 14.9% enhancement in CS of GrNCC with 0.025 wt% addition of GrN. Meng and Khyat30 employed GrN ranging from 0 to 0.3% and observed that as the nanomaterial content, When the value rose from 0 to 0.3%, the tensile strength and energy absorption capacity exhibited respective improvements of 56% and 187%. Similarly, the flexural strength had a 59% increase, while the toughness experienced a significant boost of 276%. At the 0.2% GrN level, the UHPCs exhibited strain-hardening properties in both tension and flexure. Matalkah and Soroushian31 noted that the CS of the alkali-activated binder increased with the incorporation of GrN at modest concentrations. Furthermore, Li et al.32 investigated the utilization of surface-treated GrN in cementitious systems. Their study revealed that incorporating 0.5% GrN resulted in a 19% increase in CS. Moreover, Murugan et al.33 observed a 22% enhancement in CS of GrNCC with the addition of 0.02% of GrN. It is important to emphasize that the properties of concrete composites are influenced by various factors such as the kind of GrN, their concentration, and the amount of water in them. Determining the optimal GrN content in the matrix is an essential area of study in concrete composites. It is crucial to establish the association between composite qualities and numerous parameters, such as the GrN content34,35,36,37. Enhancing the strength characteristics of these materials is achievable by integrating the correct amount of GrN. Nevertheless, excessive or inadequate GrN content can have adverse effects on cementitious composites38. Carrying out mechanical tests to assess composite properties is both time-consuming and costly. These constraints make it impractical to conduct tests on numerous specimens with different GrN concentrations using empirical formulations, given the non-linear relationship between parameters. In addition, existing conventional modeling methods are inadequate in representing the complex multiscale interactions of graphene nanoparticle-reinforced composites because they lack spatial accuracy, neglect fundamental effects, and cannot account for material heterogeneity and high non-linear behaviour due to the presence of GrN. Hence, the development of ML-based predictive models for composite materials is essential to tackle these obstacles. Therefore, ML can be alternatively employed to create reliable prediction models for assessing the characteristics of GrNCC39,40,41.

Recent advancements in data science have led to the development of ML algorithms, which exhibit impressive efficacy and precision in forecasting outcomes by training predictive42,43,44,45,46,47. Consequently, leveraging computational modeling with data acquired from published studies presents a cost-efficient means of assessing concrete properties48,49,50. Similarly, ML-driven evaluations of concrete properties offer more cost-effective, swifter, and more flexible alternatives to traditional experimental methods51,52,53. The ML-based prediction methods consist of two main stages: training and testing. During the training phase, ML algorithms carefully analyze large datasets to reveal hidden connections and patterns within the data being analyzed54,55,56. These observed trends are used as the basis for creating a prediction model, which is then used to analyze new, unknown data during the testing phase56,57,58. The ability to deduce from existing data and process unexpected information makes ML techniques highly valuable for a diverse range of applications. To address the constraints associated with traditional modeling methodologies, the extensive utilization of ML methods is widely employed to predict the characteristics of cement-based materials. For example, Majid and Javed58 estimated the strength and durability characteristics of concrete by utilizing decision tree (DT) and AdaBoost regressor (AR) models. The DT and AR models exhibited exceptional precision (R = 0.99) when evaluating the characteristics of blended cement concrete. Similarly, Chen at.59 employed gene expression programming (GEP) to forecast the strength properties of concrete containing waste foundry sand. The GEP approach exhibited excellent precision(R = 0.99) in estimating the output. Biswas et al.60 utilized a support vector machine (SVM) to calculate the level of carbonation of concrete incorporating fly ash. The model was trained and validated using a dataset of 300 data points collected from prior research. The results showed a strong correlation (R  > 0.95). Amin et al.61 utilized extreme gradient boosting (XGBoost), random forest (RF), and light gradient boosting machine (LightGBM) approaches to examine the bond Strength of fiber-reinforced polymer with concrete. The LightGBM model exhibited greater precision than the RF and XGBoost approaches. Moreover, Nafees et al.62 used GEP and adaptive neural fuzzy detection systems (ANFIS) to forecast the CS and tensile strength of concrete containing silica fume. The GEP and ANFIS models provided higher prediction accuracy (R  > 0.98) to estimate the strength properties of concrete containing silica fume. Wu and Huang63 forecasted the strength of concrete using a categorical boosting (CatBoost) approach optimized honey badger algorithm. The CatBoost approach showed extremely accurate predictions for CS and flexural strength of concrete.

Although there is an extensive study that forecasts the properties of different cementitious composites, there have been few studies that particularly examine the use of ML to calculate the properties of GrNCC. Dong et al.64 utilized the XGBoost model to forecast the CS of GNPC, achieving a strong R-value of 0.95. Montazerian et al.65 employed DT and SVM to predict the strength of concrete, including graphene derivatives. These ML methods exhibited robust prediction accuracy. Likewise, Sun et al.66 utilized Random Forest (RF) to create an ML model for the CS of concrete incorporating graphite and slag powder. The RF model exhibited exceptional predictive accuracy, proven by a high R-value of 0.99. Moreover, Yang et al.67 developed several models, such as AutoGluon-tabular (AGT), SVR, RF, and ANN. Among these models, the ANN demonstrated superior accuracy compared to the other models. Although the aforementioned models attained high accuracy in forecasting the properties of GNPC, the aspect of interpretability has frequently been overlooked despite its importance in building confidence in ML models for real-world applications. Therefore, to overcome the issues in the previous models, this study utilizes different ML methods to develop prediction models for the CS of GrNCC. The primary objectives of the present study are (1) to establish robust and reliable ML-based prediction models using DT, CatBoost, ANFIS, and LightGBM, (2) to conduct a comparative study of the developed ML models, (3) to utilize SHAP analysis to unveil the controlling parameters in assessing the CS of GrNCC.

Research methodology

Data collection

A comprehensive literature review was conducted to gather experimental results regarding the CS of GrNCC. Relevant data from a variety of published articles16,17,26,28,29,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93 that provide thorough insights into the factors and characteristics that affect compressive strength was compiled, resulting in a database comprising 172 data points as depicted in Appendix A. The selective method of data compilation looked to enhance the dataset with high-quality and appropriate information to conduct a thorough analysis of factors influencing compressive strength.

Selection of input variables

An extensive literature review was carried out to examine the factors influencing the CS GrNCC. Key factors identified through this review were compiled and utilized to define the input characteristics of the datasets. In this study, the inputs were chosen based on the literature studies94,95. The final selected inputs are classified into three categories: raw materials properties, GrNs suspension method, and testing age. The raw materials include sand content (SC), and GrN properties such as GrN diameter (GD), GrN thickness (GT), and GrN content (GC). Moreover, the GrN dispersion method includes ultrasonication (US), and curing age (CA). The output property is the compressive strength (CS) of GrNCC. GrNCC specimen preparation typically follows the procedures outlined in ASTM C3996. Consequently, the composition of cement in the acquired data record was standardized to 1, with the dosage values of other additives adjusted to percentages relative to the mass of the cement. Physical parameters such as GrN thickness, content, and diameter were considered, as they impact the compressive strength of GrNCC. However, the aspect ratio was not taken into account, as it represents the ratio of diameter to thickness. Based on collected data points, fine sand over medium sand was the preferred choice among researchers studying GrNCC. Additionally, graphene oxide (GrO) was more commonly used than reduced graphene oxide (rGrO) in GrNCC studies. This study specifically employs GrO, a type of graphene nanoparticle, to construct the database. Chamber curing was favoured over water curing in approximately twice as many studies. Furthermore, the inclusion of a superplasticizer in the model was omitted, which is consistent with recommendations found in the literature97. The duration of ultrasonication processing emerged as a critical factor in assessing the dispersion impact of graphene nanoparticles in water, denoted by a parameter termed US, with values input in hours. Regarding curing conditions, curing age was identified as a pivotal parameter regulating the hydration status of GrNCC, exerting a significant influence on the attributes under investigation. Theoretically, compressive strength is considered to be unaffected by sample size98. Therefore, it was deemed suitable to exclude specimen dimensions as an input variable. Finally, considering the thorough assessment of the literature as discussed earlier, the seven most significant inputs were chosen for the final database development. These factors are well-documented and widely recognized for their influence on compressive strength. The descriptive statistics of the collected dataset of these seven inputs are provided in Table 1.

Table 1 Statistical analysis of the acquired dataset.

Statistical analysis for input variables, including measures of data distribution, central tendency (mean, median), extremes, and trends, enhances the comprehension of the database, thereby improving its overall understanding59,99,100. It is suggested that the kurtosis and skewness of the database must be within ± 3 and ± 10, respectively. These parameters are within the recommended range for the collected dataset. In addition, the CS of the GrNCC ranges from 14.59 MPa to 94.26 MPa. This statistical analysis showcases the adaptability of the constructed machine learning models to a wider range of data, thus enhancing their practical utility. Furthermore, the variable's frequency distribution histograms are provided in Fig. 2.

Figure 2
figure 2

Frequency distribution histogram of the parameters.

Preprocessing of the data

The preprocessing conducted on the original datasets is crucial for enhancing the outcomes and performance of the machine learning (ML) modelling process. During the preprocessing stage, outliers were identified and subsequently removed from the final database to enhance the overall integrity and reliability of the dataset. The outliers were addressed by employing a mutation procedure, in which irregular data points were replaced with the respective mean values. To avoid human bias in the model evaluation, the original datasets were split into training and testing sets at a ratio of 7:3. Therefore, 70% of data was employed for training (learning) the model, while 30% of data was utilized for testing the developed model.

Correlation of parameters

Prior to additional processing, it is essential to explore relationships between input parameters. Pearson's correlation coefficient (r), typically ranging from − 1 to 1, is commonly used to measure linear correlation between any two parameters101. As the absolute value of the r decreases, the correlation between two parameters also decreases, and vice versa. Smith102 suggests that a r exceeding 0.80 indicates a strong relationship between two parameters. This type of issue is often referred to as the multi-collinearity problem98. To avoid multi-collinearity problems in the acquired dataset, it is essential that the r must be lower than 0.8 between any two parameters43,59,99,100,101,102,103,104. It is evident that there is no risk of multi-collinearity as all R-values, whether negative or positive, are below 0.8, as shown in Fig. 3. Moreover, GrN thickness (+ 0.48) and curing age (+ 0.47) have the highest positive correlation with CS, while the water-to-cement ratio has the highest negative correlation (− 0.35) with CS.

Figure 3
figure 3

Correlation heatmap of the dataset.

Normalization of data

Variations in the scales and ranges of inputs can affect the performance of ML models105. As indicated in Table 1, both input and output parameters exhibit wide ranges. For example, GT ranges from 0.7 to 27.6, GD varies from 0.072 to 50, and CS ranges from 14.59 to 94.26. To mitigate potential bias stemming from these variations, the original database underwent normalization before being utilized for ML modeling106. As a component of the data preprocessing for the ML models, the StandardScaler procedure was utilized. Its purpose is to normalize the functional scale of the data record107,108. This involves eliminating the average and clearing each parameter to have a unit difference (variation)109. StandardScaler ensures uniform scaling across various parameters, thereby reducing potential biases in the data and improving the overall efficacy of the models53. The exact normalized range can be seen in Table 1.

Utilized ML approaches

When it comes to tangible substances, the benefits of using machine learning techniques are very noticeable. Conventional statistical methods frequently face difficulties in capturing the complex, non-linear connections between input factors and material qualities that are inherent in concrete compositions. Machine learning methods, such as decision trees, light gradient boosting, catboost, and adaptive neuro-fuzzy inference systems (ANFIS), provide a strong and reliable solution to this problem. These methods are highly proficient at analyzing intricate datasets, enabling the detection of nuanced patterns and correlations that may be overlooked by traditional approaches. Our study improves the accuracy of predicting concrete material qualities by utilizing machine learning skills. This, in turn, allows for more exact design and optimization processes. Furthermore, the knowledge obtained from machine learning analyses enhances comprehension of the fundamental principles that control specific actions, enabling researchers and professionals to make well-informed choices in the field of material design and engineering applications.

Decision tree (DT)

The Decision Tree (DT) algorithm is a widely used ML approach employed for both classification and regression tasks. The DT algorithm operates by iteratively dividing the data into smaller groups according to the values of input attributes110,111,112. At each iteration, the algorithm chooses the feature that optimally divides the data into subsets with the highest level of purity, often utilizing measures such as Gini impurity or entropy. This process persists until a specified stopping requirement is satisfied, such as attaining the maximum depth of the tree or encountering nodes with a minimal number of samples, as depicted in Fig. 4112,113. In order to create predictions, the algorithm navigates around the tree, starting from the root node and ending at a leaf node. At the leaf node, it assigns the most common class for classification tasks or the average value for regression tasks. Decision trees are easily understandable, but they are susceptible to overfitting. This issue can be mitigated by employing techniques such as pruning or ensemble methods like RF and other boosting methods114.

Figure 4
figure 4

Architecture of DT.

LightGBM

LightGBM is a gradient-boosting framework that is widely used in ML tasks for classification, regression, and ranking. LightGBM sequentially builds DTs, with each new tree focusing on the mistakes of the previous ones115,116. Unlike traditional gradient boosting methods, LightGBM uses an innovative method called gradient-based one-sided sampling to construct trees, which notably speeds up the training procedure and reduces memory usage. Additionally, it implements leaf-wise tree growth, where it grows the tree node-by-node, selecting the leaf with the maximum delta loss, heading to faster convergence47. During prediction, LightGBM traverses the ensemble of trees and calculates the final prediction by summing up the predictions from each tree. Its efficient implementation and ability to handle large datasets make LightGBM a popular choice for various machine-learning applications117,118.

CatBoost

CatBoost, a gradient-boosting framework, is particularly adept at handling categorical features effectively119. Unlike other frameworks, CatBoost integrates predictions from various models to produce a more stable and generalizable output. This approach helps prevent overfitting by averaging errors across models while maintaining high prediction performance. CatBoost employs symmetric trees, ensuring consistent splitting conditions across nodes at the same depth, which aids faster computation and helps control overfitting45. Its splitting method employs a strategy, selecting the split with the least penalty, which optimizes decision-making. Boosting in CatBoost leverages categorical information to improve model generalization, enhancing data robustness. Notably, CatBoost provides specialized support for categorical features through features like one_hot_max_size and cat_features, facilitating faster training and addressing overfitting concerns. In summary, CatBoost offers a comprehensive solution for ML tasks, especially in scenarios with diverse data types and the need to mitigate overfitting risks120.

ANFIS

Jang121 introduced the ANFIS approach that integrates artificial neural networks with fuzzy logic. ANFIS, is a hybrid computational model that combines the adaptive capabilities of neural networks with the human-like reasoning of fuzzy logic. It aims to create a flexible and interpretable framework for solving complex problems, particularly in pattern recognition, system identification, and control applications. ANFIS employs a structure consisting of five layers: the input layer (GrNCC ingredients), fuzzification layer, rule layer, defuzzification layer, and output layer122,123,124,125. In ANFIS, the input layer collects input data, which is then passed to the fuzzification layer, where linguistic variables are defined to represent the input data's characteristics. The rule layer finds out the firing intensity of each rule based on the input variables' membership functions and fuzzy rules126. These rules capture the relationship between input and output variables using if–then statements, which are derived from expert knowledge or data-driven learning. Next, the defuzzification layer aggregates the outputs of each rule to generate a crisp output value. This layer typically utilizes methods like weighted averaging or centroid defuzzification to combine the fuzzy outputs127. Finally, the output layer (CS) creates the overall output of the ANFIS model. Furthermore, ANFIS employs a hybrid learning algorithm to adjust its parameters and optimize its performance. This algorithm combines gradient descent with least squares estimation to iteratively update the parameters of the model based on the training data. By adaptively adjusting its parameters, ANFIS can learn complicated non-linear links between input and output features128,129. The architecture of ANFIS is shown in Fig. 5130.

Figure 5
figure 5

Architecture of ANFIS.

Parameter optimization

Hyperparameter optimization is an essential step in improving the efficacy of the ML models. In this study, an iterative process (trial and error) was used to maximize the models' prediction power by fine-tuning a combination of parameters and proposing the best combination. The performance indicators as discussed below, were used as a stopping criterion to identify the best parameters. The ideal parametric setting utilized for all developed models is shown in Table 2.

Table 2 Parametric optimization.

Performance metrics

Several performance metrics such as mean absolute error (MAE), correlation coefficient (R), mean square error (RMSE), and a10-index, were employed to assess the efficacy of the developed models. The expression of these metrics is given as Eqs. 14.

$$RMSE = \sqrt {\frac{{\sum\limits_{{i = 1}}^{n} {\left( {{\varvec{Ac}}_{i} - {\varvec{Pr}}_{i} } \right)^{2} } }}{n}}$$
(1)
$${\text{MAE}} = \frac{{\sum\nolimits_{{{\mathrm{i}} = 1}}^{{\mathrm{n}}} {\left| {{\varvec{Ac}}_{{\mathrm{i}}} - {\varvec{Pr}}_{{\mathrm{i}}} } \right|} }}{{\text{n}}}$$
(2)
$${\text{R}} = \frac{{\sum\nolimits_{{{\mathrm{i}} = 1}}^{{\mathrm{n}}} {\left( {{\textbf{Ac}}_{{\mathrm{i}}} - \overline{{{\textbf{Ac}}}} _{i} } \right)\left( {{\textbf{Pr}}_{{\mathrm{i}}} - \overline{{{\textbf{Pr}}}} _{i} } \right)} }}{{\sqrt {\sum\nolimits_{{{\mathrm{i}} = 1}}^{{\mathrm{n}}} {\left( {{\textbf{Ac}}_{{\mathrm{i}}} - \overline{{{\textbf{Ac}}}} _{i} } \right)^{2} } \sum\nolimits_{{{\mathrm{i}} = 1}}^{{\mathrm{n}}} {\left( {{\textbf{Pr}}_{{\mathrm{i}}} - \overline{{{\textbf{Pr}}}} _{i} } \right)^{2} } } }}$$
(3)
$${\text{a}}10-{\text{index}}= \frac{{\mathrm{N}}{10}}{\mathrm{n}}$$
(4)

where \({\textbf{Pr}}_{\mathrm{i}}\) and \({\textbf{Ac}}_{\mathrm{i}}\) show the anticipated and actual values, respectively, while \(\overline{\textbf{Pr}}_{\mathrm{i}}\), and \(\overline{\textbf{Ac}}_{\mathrm{i}}\) denote the mean anticipated and actual values, respectively, and n denotes the number of data points in the dataset. In the model calibration, the optimal model is identified by minimizing errors such as MAE and RMSE while also achieving a high R. Researchers suggest that a strong correlation, ideally surpassing 0.8, indicates robustness, with a perfect correlation being represented by an R-value of 1. Additionally, the N10 metric signifies the count of predictions-to-actual ratio falling within a range of 0.90–1.10. A higher a10-index value, nearer to one, indicates superior model performance51,131.

Shapley additive explanation (SHAP) interpretability

In the current study, the SHAP method was employed for model interpretability. SHAP stands out as an efficient tool for elucidating complex ML models, offering a transparent and explainable approach132. Based on cooperative game theory, SHAP values are a unique way to figure out feature importance by giving each predictor's role in model predictions a value. SHAP helps us understand the factors that affect model outputs better by carefully measuring the effects of each feature. This approach not only helps improve models, but it also encourages transparency and confidence, especially when important decisions need to be made133. Through equitable allocation of feature contributions, SHAP enables a nuanced grasp of how certain variables influence estimates, thereby advancing clarity and accountability in the intricate realm of ML models134.

Results and discussion

Regression slope analysis

Regression slope is commonly utilized to assess the effectiveness of machine learning models. In a regression plot, actual values are depicted on the horizontal axis, while the model's predicted values are shown on the vertical axis, as depicted in Fig. 6. A regression slope approaching one (> 0.8) typically indicates that the model's predictions closely align with the observed values. In training, the DT model demonstrated a slope of 0.90, showcasing a strong alignment between predicted and actual values. However, during testing, the slope dropped to 0.55, notably falling below the expected threshold of 0.80. Furthermore, the CatBoost model exhibited a higher slope of 0.99 in both phases, indicating its robust performance in estimating the CS. Similarly, the LightGBM model showed 0.96 and 0.86 for training and testing, respectively. On the other hand, the ANFIS model showed a higher slope (0.95) in training; however, it demonstrated a lower slope in testing (0.69). Overall, the CatBoost and LightGBM models provided higher slopes in both sets, indicating their higher prediction precision in estimating the out. In addition, the fitting lines of testing and training for these two models are in close proximity to the ideal fit line.

Figure 6
figure 6

Regression plots of the models: (a) DT, (b) CatBoost, (c) LightGBM, (d) ANFIS.

Statistical analysis

The statistical evaluation of the established models is given in Table 3. Several statistical metrics, such as R, a-10 index, MAE, and RMSE, are employed to comprehensively assess the efficacy of the models. The R-values in training are 0.9835, 0.9999, 0.9665, and 0.9768 for DT, CatBoost, LightGBM, and ANFIS models, respectively, while for testing, are 0.8708, 0.9999, 0.9043, and 0.8663, respectively. The correlation (R) of all the models is higher than the recommended value of 0.80, indicating the models' robust predictive performance. However, the CatBoost model provided significantly more than 0.80 (closer to 1), indicating the higher efficacy of the CatBoost model in estimating the CS of GrNCC. Furthermore, the lowest MAE training was observed for the CatBoost model (0.1405), while the highest MAE testing was noted for the DT model (3.5245). A similar observation was also noted in the testing phase. The DT model exhibited the lowest a10-index values for both training and testing sets, indicating a significant deviation of predictions from the actual values, surpassing the 10% error range in the majority of cases. Conversely, the CatBoost model showcased the highest a10-index values of 0.9918 for training and a perfect score of 1.00 for testing. This signifies that the CatBoost model consistently maintained a high level of accuracy, with the majority of its predictions falling within the 10% error margin of the true values. A higher TA value of a10-index is also observed for the ANFIS model, as shown in Table 3.

Table 3 Performance evaluation of the models.

Comparative analysis of the models

The established are compared to select the best prediction models for forecasting the strength of GrNCC, as illustrated in Fig. 7. The CatBoost model showed the highest correlation (R) and lowest error (MAE, RMSE). The CatBoost model outperforms the DT model by 96.01% in training, surpasses the LightGBM model by 94.48%, and exceeds the ANFIS model by 95.68%. Similarly, the MAEtesting of the CatBoost model is significantly lower than that of the other developed models, indicating the robust prediction performance of the CatBoost model in forecasting the output. The validation of the models' comparative analysis is reinforced through the Taylor diagram, depicted in Fig. 8. A Taylor diagram offers a comprehensive view of the prediction performance across multiple models. It allows for a concise comparison of correlation coefficients, standard deviations, and variance ratios, highlighting the strengths and weaknesses of each model in capturing the observed variability. Plotting these metrics on a single diagram, facilitates the identification of models that exhibit superior predictive skill and provides valuable insights for model selection. The optimal model is distinguished by its proximity to the benchmark (highlighted in red to represent experimental data). The model that exhibits the closest alignment with the benchmark is identified as the best-performing model. It can be observed that the CatBoost model is the closest to the benchmark symbol, followed by LightGBM, ANFIS, and the DT model. Hence, the models can be rated based of their accurateness as follows: CatBoost, LightGBM, ANFIS, and DT. Finally, previous studies have highly recommended comparing the efficacy of the ML models with non-linear regression (NLR) models. Therefore, the current study also used NLR as a benchmark to assess the performance of the models. It is obvious from Figs. 7 and 8 that the proposed models surpassed the NLR models with higher R2 values. Further, to provide additional insight into the comparative analyses, the estimated equations derived from linear regression (LR) models are presented. Equation (5) depicts the essence of the model, offering a concise representation of the relationships established during the training process while estimating the compressive strength of GrNCC. This equation can assist as a valuable tool for understanding the underlying dynamics and further contributes to the comprehensive evaluation of the model's predictive capabilities. Overall, comparative analysis shows that the suggested ML models can be employed to evaluate the compressive strength of the GrNCC.

Figure 7
figure 7

Spider plots: (a) Training, (b) Testing.

Figure 8
figure 8

Taylor diagram.

$${\text{CS (MPa)}} =49.59-1.4{\text{SC}}-0.16{\text{GD}}+1.13{\text{GT}}+0.07{\text{GC}}-57.89{\text{WC}}+5.53{\text{US}}+2.93{\text{CA}}$$
(5)

Comparison with literature models

A comparative analysis between the models proposed in this study and those established in existing literature for calculating the CS of GrNCC is listed in Table 4. Alabduljabbar et al.135 utilized the GEP model to anticipate the strength of GrNCC. It has been reported that the GEP model demonstrated robust performance as compared to traditional statistical methods. The GEP approach depicts significant R2-value of 0.92. Additionally, Yang et al.67 formulated multiple models, including RF, SVR, AutoGluon-tabular (AGT), and ANN. Among these, the ANN model exhibited exceptional accuracy in estimation. Montazerian et al.65 employed three different ML algorithms, specifically SVR, DT, and MLP, to forecast the strength of cementitious materials incorporated with GrNs. These models exhibited similar levels of prediction accuracy. Furthermore, Khan et al.136 utilized DT, AR, and BR models for forecasting the strength of GrNCC. The DT and AR models provided similar performance when estimating the output. Sun et al.66 established predictive model for the CS of cementitious composites containing nano materials. The author employed random forest (RF) method enhanced with beetle antennae search (BAS) for optimization, and revealed that RF-BAS model demonstrated outstanding predictive accuracy. Dong et al.137 applied a Bayesian-tuned XGBoost model to forecast the CS of GrNCC. The author revealed that Bayesian model achieved a high correlation coefficient (R2) of 0.947 with less mean absolute error of 3.44, as detailed in Table 4. In addition, CatBoost, decision tree, LightGBM, and ANFIS is employed in this study. CatBoost model demonstrated significantly lower error and higher accuracy compared to the existing literature models, showcasing superior performance and outperforming them comprehensively as illustrated in Table 4.

Table 4 Comparison with literature models.

SHAP interpretability

Generally, ML models are often regarded as black boxes, lacking transparency. Therefore, the SHAP interpretability method was selected for model interpretability135,138. In this study, the best prediction model (CatBoost model) was chosen for SHAP analysis. Figure 9a illustrates the SHAP feature importance plot of inputs. The thickness of GrN plays a pivotal role in GrNCC, significantly influencing CS and consequently exhibiting the highest SHAP value of + 9.39 for CS. The diameter of GrN, curing age, and w/c ratio are also prominent features in assessing the CS of GrNCC, as indicated by their higher SHAP values of + 6.19, + 5.63, and + 2.74, respectively. The remaining feature has the least significant SHAP values. While Fig. 9b provides an overview of the significance of various features, it does not reveal whether these features correlate positively or negatively with the outputs. Hence, a SHAP summary plot, as depicted in Fig. 10, is provided to illustrate the correlation between parameters and CS. Every data point on the summary plot has specific information about a Shapley value that corresponds to the features. By utilizing SHAP analysis, it is possible to compare the global mean with the model's output to identify individual causes, and features are then ordered according to their significance. In addition, the x-axis represents the negative (left) and positive (right) impacts of features, illustrating their influence on both extremes. Feature values are represented by the x-axis and y-axis in the model, emphasizing their significance and influence. For instance, a higher level of GT is linked with a higher SHAP value of CS, indicating that the higher GT positively impacts the CS. Similarly, the higher curing age also positively impacts the CS of GrNCC. In contrast, higher w/c and GD negatively impact the CS of GrNCC. Furthermore, based on the SHAP analysis, the US and SC have less considerable influence on the CS. The findings of the study are in close agreement with the literature44.

Figure 9
figure 9

SHAP features importance plots.

Figure 10
figure 10

SHAP summary plot.

Graphical user interface

This section presents the establishment of a user-friendly graphical user interface (GUI), as shown in Fig. 11. The graphical user interface (GUI) consists of a toolbar, an input area, and an output area. Researchers can easily input the seven variables into the designated input area. By selecting the "Predict" button in the output section, they can retrieve the expected values of GrNCC. The input portion has seven parameters that exert an influence on the strength of nano material concrete. This graphical user interface (GUI) can be utilized by engineers to generate concrete that fulfills the specifications and demand requires to construct building. The graphical user interface (GUI) was specifically created to prioritize user-friendliness, enabling researchers to rapidly study and forecast the behavior of nano material-CS. The authors aim to empower researchers to fully utilize the possibilities of the proposed model in their research by developing this GUI.

Figure 11
figure 11

Graphical user interface of nano composite material.

Conclusion

This research utilized innovative ML models to estimate the CS of GrNCC. To construct the database for model development, 172 data points of CS were gathered from published experimental studies. The efficacy and performance of these models were assessed using diverse statistical metrics. Moreover, the SHAP explanatory technique was employed to interpret the models. The major findings of the study are as follows:

  1. 1.

    The suggested DT, CatBoost, LightGBM, and ANFIS models exhibited excellent prediction efficacy with R-values of 0.8708, 0.9999, 0.9043, and 0.8662, respectively. Similarly, the MAE score of these established models is 4.6242, 0.0704, 3.1017, and 4.0400, respectively. The developed models exhibited a close alignment with the actual CS values, underscoring their efficacy in accurately calculating the CS.

  2. 2.

    While all the suggested models demonstrated higher accuracy in predicting compressive strength, the CatBoost model exhibited exceptional prediction efficiency. The CatBoost model showed the highest correlation (R) and lowest error (MAE, RMSE). The CatBoost model outperforms the DT model by 96.01% in MAE, surpasses the LightGBM model by 94.48%, and exceeds the ANFIS model by 95.68%, indicating the robust performance of ANFIS to forecast the CS of GrNCC.

  3. 3.

    The SHAP analysis provided that the thickness of GrN plays a pivotal role in GrNCC, significantly influencing CS and consequently exhibiting the highest SHAP value of + 9.39 for CS. The diameter of GrN, curing age, and w/c ratio are also prominent features in estimating the CS of GrNCC. The remaining feature has the least significant SHAP values. In additon, GUI can help researchers/practicioners to forcast concrete properties in due time.

The dataset used in this study contains limited data points. Therefore, this study underscores the necessity for a more extensive dataset to enhance the precision and generalizability of machine learning models. While employing interpretability through SHAP, future endeavours should explore other interpretation techniques like ICE, LIME, and PDP. Additionally, it is proposed that evolutionary algorithms be developed to improve prediction accuracy further. Furthermore, conducting a comparative assessment of graphene and its potential alternatives through a blend of experimental investigation and ML-based modelling analyses is recommended.