Introduction

Nanoparticles, tiny species with an average size of 1–100 nm, are classified in various aspects such as their shape (or morphology), size, and physicochemical characteristics. Aluminum (Al), silver (Ag), titanium (Ti), and other metals, as well as oxide-based or non-metallic materials like copper-oxide (CuO), titanium-dioxide (TiO2), aluminum-oxide (Al2O3), and so on, can all be found in the nanoparticles category. When nanoparticles are mixed in conventional base-fluids, they generate novel compounds known as nanofluids with distinct properties1. Water (W), ethylene glycol (EG), water-ethylene glycol mixture (W+EG), propylene glycol (PG), glycerol (GC), and engine oil (EO) are just a few examples of the wide variety of base-fluids that might be employed. Finally, nanofluids can be prepared in the forms of mono-nanofluids (e.g., Ag/W or TiO2/EG), binary hybrid nanofluids (e.g., SiO2–CuO/EG), or ternary hybrid nanofluids (e.g., Al2O3–CuO–TiO2/W), according to their expected purpose and application2.

The utilization of nanofluids for a variety of practical purposes is a popular and developing scientific field that is getting more attention day by day. Some of the most important applications of nanofluids, which can be observed in numerous research studies and publications, include the following:

  • heat exchangers with different configurations (such as shell and tube, finned tube, double pipe, etc.)3,

  • various types of sun-based systems (especially solar collectors and photovoltaic panels)4,5,

  • cooling systems of vehicle engines and electronic components6,7,

  • cooling of nuclear reactors8,

  • drug-delivery and biomedical systems9,

  • enhanced oil recovery (EOR) techniques10,

  • common machinery processes (milling, drilling, grinding, and turning)11.

Until now, many analytical and numerical calculations have been performed on fluid-flow problems based on various kinds of nanofluids, mainly in research pertaining to the sectors of energy, electronics, and medicine. For instance, Jia et al.12 carried out the numerical analysis of a glazed photovoltaic-thermal (PV/T) collector employing two kinds of coolant nanofluid (Al2O3/W and TiO2/W) with the aim of investigating its thermal and electrical performances. McCash et al.13 mathematically conducted a comparative analysis for the peristaltic flow of two distinct nanofluids (Ag–Cu/W and Cu/W) inside an elliptical duct with sinusoidally advancing boundaries. Shahzad et al.14 analyzed the electro-osmotic flow of a blood-based hybrid nanofluid consisting of single- and multi-walled carbon nanotubes (SWCNT and MWCNT) through a multiple-stenosed artery with permeable walls. Alghamdi et al.15 mathematically modeled the two-dimensional magneto-hydrodynamics (MHD) flow based on the Cu–Al2O3/W hybrid nanofluid between two co-axial stretchable disks and numerically assessed it using the finite element method (FEM). Baig et al.16 presented exact analytical solutions for the stagnation-point flow of water-dispersed MWCNTs passing across a heated stretching cylinder.

Nanofluids have exceptional properties compared to normal fluids, which makes them a preferred choice in the field of mass and heat transfer phenomena. Following the stabilization of nanoparticles in the base-fluid after mixing together, the thermophysical properties of the resultant nanofluid, such as viscosity (μ), thermal conductivity (k), specific heat capacity (Cp), and density (ρ), can be significantly altered. Several research efforts have been conducted on this topic in order to determine what other factors affect nanofluid properties and how these properties behave under different operational conditions. The present study only looks into changes in the specific heat capacity of mono-nanofluids (Cp,nf) as a function of the specific heat capacity of nanoparticles (Cp,np) and base-fluids (Cp,bf), temperature (T), particle volume fraction (ϕv), and average particle size (dnp).

Recently, many researchers have been attempting to establish the specific heat values of nanofluids in various ways, such as by performing experimental tests, applying machine learning techniques (black-box models), and developing empirical mathematical relationships (white-box models). In 2008, Vajjha and Das17 released an empirical correlation for estimating the specific heat capacity of Al2O3/W+EG nanofluid as a function of temperature (313–363 K) and volume concentration (2–10 vol%). Their experimental measurements served as the basis for this cubic polynomial correlation. In addition, by comparing their measurements with estimated values achieved from the theoretical model of Pak and Cho18, they observed an acceptable level of agreement, with maximum and average deviations reported as 22% and 15%, respectively. In another investigation conducted in 2009, Vajjha and Das19 created a specific heat correlation using the experimental data related to SiO2/W, ZnO/W+EG, and Al2O3/W+EG nanofluids. Their correlation took into account four variables containing specific heat capacity of both nanoparticles and base-fluids, temperature, and particle volume concentration. This correlation set a defined range for each of the first two variables, and its constants for the three types of nanofluids were different. In a subsequent study by Vajjha and Das20 in 2012, they updated the correlation proposed in reference19 by incorporating the experimental data of CuO/W+EG nanofluid and making the temperature term dimensionless via entering a reference temperature. In 2013, Barbés et al.21 assessed the specific heat capacity of Al2O3/EG and Al2O3/W nanofluids as a function of temperature and volume fraction. The researchers also developed a simple linear regression model that suited their experimental data well. In the next year, they performed another similar research22 on CuO/W and CuO/EG nanofluids. In 2015, Cabaleiro et al.23 studied the specific heat changes of five distinct metal-oxide-based nanofluids at high concentrations. These nanofluids contained ZrO2, ZnO, and MgO in pure EG, as well as ZrO2 and ZnO in the W+EG mixture (50:50 vol%). Finally, the researchers developed a specific heat correlation as a function of the nanoparticle and base-fluid specific heat capacities and particle volume fraction. In another research, Sekhar and Sharma24, while studying the specific heat capacity of Al2O3/W nanofluid at low concentrations (0.01–1 vol%), proposed a specific heat correlation in accordance with 81 experimental data-points of water-based Al2O3, SiO2, TiO2, and CuO nanofluids collected from other researches. This regression equation was valid in the particular range of nanofluid temperatures, nanoparticle diameters, and volume fractions. It was found that the calculated values and experimental data were well compatible with each other, such that a deviation range between −8% and +10% was obtained. Satti et al.25, after evaluating the specific heat capacities of five various nanofluids consisting of CuO, ZnO, SiO2, TiO2, and Al2O3 nanoparticles suspended in a W+PG mixture (40:60 wt%), developed a correlation onto 610 measured data by employing the Minitab statistical software. Popa et al.26, according to their experimental measures on the specific heat capacity of CuO/W, Al2O3/W, and Al2O3/EG nanofluids, recalibrated the correlation developed by Vajjha and Das19 and captured its new coefficients based on the analyzed nanofluids. The authors reported excellent agreement with a maximum Average Relative Error of nearly 2%. Moldoveanu and Minea27, based on their examinations measuring the specific heat capacities of TiO2/W, SiO2/W, and Al2O3/W nanofluids, modified Sekhar and Sharma’s24 equation to propose another correlation. Its validity was limited to a range of room temperatures and volume fractions of less than 5%. The average deviation of the specific heat values estimated via their correlation was reported to be roughly 11% when compared to the experimental results. In 2020, Çolak et al.28 presented a correlation with an average error of –0.005% utilizing a database with 1287 data-points made up of volume fraction and temperature (as independent variables) and experimental specific heat values of Cu–Al2O3/W hybrid nanofluid. Gao et al.29 proposed a correlation that was fitted by the experimental specific heat capacities related to the hybrid nanofluid of GrapheneOxide–Al2O3/W with mass fractions of 0.05–0.15 wt% at a temperature range of 20–70 °C.

Since the specific heat capacity of nanofluids relies on various conditions and factors, it is often time-consuming, expensive, and difficult to quantify this property accurately via experimental procedures. To determine the specific heat capacity values of mono-nanofluids, robust machine-learning methods were employed in our recent study30, but the current work intends to provide novel empirical correlations. The literature review clearly indicates that the number of correlations developed so far by other researchers is limited. On the other hand, old correlations are only valid and suitable for certain nanofluids and only cover a restricted range of independent variables. As a result, there is still an opportunity for advancement in this subject, and it is conceivable to provide more comprehensive correlations with fewer constraints that can be applied to an extensive collection of nanofluids.

The current study’s primary aim is to provide novel and precise mathematical correlations for estimating the specific heat capacity of mono-nanofluids. It should be noted that the considered mono-nanofluids have been selected only based on the oxide-based or non-metallic nanoparticles dissolved in conventional base-fluids. For this purpose, an extensive database consisting of experimental data taken from the literature is utilized (the same one involved in our previous study30), and finally, four correlations are created using robust soft-computing procedures. To evaluate and compare the quality of the present correlations with prior ones suggested by others, various statistical and graphical error analyses are applied. The principal advantage of these new mathematical equations is that they have considerably mitigated the limitations of the earlier correlations. Therefore, they are applicable to many different oxide-based mono-nanofluids and a broad range of independent input variables.

Dataset specifications

The specific heat capacity of mono-nanofluids (CP,nf) can be defined as a function of average particle size (dnp), particle volume fraction (ϕv), temperature (T), nanoparticle specific heat capacity (CP,np), and base-fluid specific heat capacity (CP,bf). Accordingly, 2084 data-points related to the experimental values of CP,nf were collected from diverse sources (19 references) available in the literature17,19,21,22,23,25,29,31,32,33,34,35,36,37,38,39,40,41,42. Before generating the final flawless dataset, necessary corrective actions were carried out in the data pre-processing stage, such as data cleaning, data integration, and data reduction, to ensure the establishment of more accurate estimations. Table S1 (in the Supplementary File) lists the references used to extract the required data, the characteristics related to the various types of nanoparticles and base-fluids available in the final database, as well as the ranges of input (independent) and output (dependent) variables. Additionally, Table S2 (in the Supplementary File) provides the descriptive statistics for each of the variables included in the database. It should be mentioned that the information given in these two tables was also reported in our prior article30. Moreover, box-and-whisker plots and frequency distribution histograms related to the six variables available in the database were drawn in previous work30, which are avoided here.

Description of techniques

The modeling and prediction of complex or unknown systems’ behavior employing input–output data is a common use of system identification techniques. Because of this, there is now much more interest in soft-computing techniques, which deal with data processing in uncertain and inexplicit environments. Many scientific studies have discussed the application of evolutionary algorithms as efficient soft-computing tools for system identification43,44,45,46.

Due to the data-driven nature of all soft-computing strategies, having more data leads to a more reliable and comprehensive model. In this regard, the present study covers four popular techniques for creating distinct mathematical models to estimate the specific heat capacities of mono-nanofluids. These techniques include Generalized Reduced Gradient (GRG), Genetic Programming (GP), Gene Expression Programming (GEP), and Group Method of Data Handling (GMDH), which are all covered with further details in the following subsections.

The specific algorithms adopted in this study were essentially chosen based on their suitability and ability to address the research goal, their compatibility with the quantity and quality of data-points contained within the dataset, and their own unique strengths and weaknesses. The successful usage and efficient performance of these techniques have been proven across a variety of investigations conducted in numerous fields of engineering sciences47,48,49,50,51,52,53,54.

A: Generalized Reduced Gradient (GRG)

One of the typical strategies used to solve multi-variable problems is the generalized reduced gradient (GRG). In this technique, the decision variable is the vector of X(x1,x2,…,xn), and the constraints are the functions of g1,g2,…,gm. The objective function or any constraint function can be linear or non-linear. Additionally, it’s possible that the ranges can be undefined. If no constraints exist, the problem is called an unconstrained optimization problem. It should be noticed that the lower and upper limits of the variables do not act as additional constraints, but are applied separately to the given program55.

The GRG technique uses the first-order partial derivatives of each function gi with respect to the variables xi, which are calculated by approximation of the forward or central finite difference. The program is executed according to the simulator’s initial values. If the values provided by the simulator do not satisfy all constraints gi, the first step of optimization starts, and in this situation, the objective function equals the sum of deviations from the constraints plus a fraction of the problem’s objective function. This optimization ends with a message that indicates the practicality or impracticality of solving the problem. It should be noted that in some cases, the impractical response is due to the limitation of the program in local minima, which can be corrected by changing the initial value and re-executing. In the next step, the complete optimization cycle is performed, and the final report is obtained54,56.

B: Genetic Programming (GP)

A subset of the expanding family of evolutionary algorithms, namely genetic programming (GP), involves a population of randomly generated computer programs (Fig. 1). GP applies natural evolution-inspired search principles to a variety of diverse problems, particularly parameter optimization57. The concept of genetic programming clarifies a branch of artificial intelligence (AI) study that is concerned with the evolution of computer programming codes. The evolution process operates artificially in a way that is comparable to how living organisms evolve naturally, but most of its complicated details have been summarized57.

Figure 1
figure 1

A flowchart of the GP technique.

GP is an organized procedure for getting machines to solve a problem automatically, starting with a high-level statement that must be executed. This technique is considered a systematic and domain-independent algorithm that genetically reproduces a population of programs for problem-solving58,59. Besides, GP is an AI framework that imitates natural selection to identify the best prediction. This iterative procedure chooses the most suitable descendant to pass and regenerate at each new step of the algorithm60. GP has been successfully employed for a great number of various tasks, including data-mining61, pattern recognition62, and artificial neural network (ANN) synthesis63.

C: Gene Expression Programming (GEP)

In 1999, Ferreira first presented the concept of Gene Expression Programming (GEP) as a strong soft-computing technique64,65. It is an advanced and upgraded version of GP and employs two major computational components in its structural computations for regression tasks, namely chromosomes (genotype) and expression trees (phenotype). However, the GP structure utilizes nonlinear forms of responses, namely parse trees. Actually, the chromosomes detect the selection or initial predictions for a particular problem, in which case using an interpretation strategy will allow the expression trees to provide more accurate solutions to the issues66,67,68. The simple flowchart related to this technique is shown in Fig. 2.

Figure 2
figure 2

A flowchart of the GEP technique.

GEP, as a full-grown genotype/phenotype technique, develops computer programs encoded in fixed-length linear chromosomes. The structure of linear chromosomes makes possible the beneficial and unrestricted activity of crucial genetic operators like recombination, transposition, and mutation. Besides, GEP contributes a similar form of tree description and illustration to GP in order to retrace readily the stages accomplished by GP and discover effortlessly new boundaries established by exceeding the phenotype threshold66.

In recent decades, GEP, as a preferred and recognized evolutionary approach for automatically creating computer programs, has developed and advanced quickly, such that a variety of advanced GEPs have been suggested for real-world applications that are proliferating rapidly69.

D: Group Method of Data Handling (GMDH)

A novel computational method for managing complex nonlinear computer tasks, known as the group method of data handling (GMDH), was first developed by Ivakhnenko. Figure 3 illustrates a functional flowchart for the GMDH technique. This technique can describe an explicit relationship between a mathematical model’s inputs and its corresponding output70, 71. This technique is one of the most satisfactory groups of artificial neural networks (ANNs) and is also known as a polynomial neural network (PNN). The inner layers of the GMDH technique contain a variety of independent nodes. This technique is provided in polynomial form, where all nodes in every layer are joined in couples through a quadratic polynomial and generate new polynomial-formed nodes in the subsequent layer, in accordance with the self-organizing principle67,72.

Figure 3
figure 3

A flowchart of the GMDH technique.

The GMDH technique is based on employing several nodes from the middle layers. Every node of GMDH returns a value by applying a quadratic polynomial approach that incorporates the preceding node65,73. As previously pointed out, the nodes of the next layers are produced using the quadratic polynomial functions that combine the nodes already present in the prior layers. The following formula represents the procedure of the GMDH technique65,67,74:

$$y_{i} = c + \mathop \sum \limits_{i = 1}^{w} \mathop \sum \limits_{j = 1}^{w} ... \mathop {\sum \, }\limits_{k = 1}^{w} c_{ij \ldots k} x_{i}^{n} x_{j}^{n} ...x_{k}^{n} \quad { ; } \quad \quad n = 1,2,..., 2^{m}$$
(1)

in which xij…k and yi respectively depict input and output variables of the model, m and w respectively indicate the size of layers and the number of inputs, as well as c and cij…k stand for the polynomial factors.

By defining new nodal parameters (namely Pi) for the situation of two nodes connected using a quadratic polynomial approach, the below formula is presented65:

$$P_{i}^{GMDH} = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i} x_{j} + a_{4} x_{i}^{2} + a_{5} x_{j}^{2}$$
(2)

The least-squares method (LSM) is then used to minimize the variance between model predictions and actual data, as shown below65:

$$\delta_{j}^{2} = \mathop \sum \limits_{i = 1}^{{ \, N_{t} }} \left( { y_{i} - P_{i}^{GMDH} } \right)^{2 }\quad { ; }\quad \quad j = 1, 2, \ldots , \left( {\begin{array}{*{20}c} w \\ 2 \\ \end{array} } \right)$$
(3)

where Nt and w are the size of the training subset and the number of variables, respectively.

Later, the following general matrix will be created after obtaining the constants of Eq. (2) 65:

$$Y = A^{T} X \quad \quad {\text{or}} \quad \quad A^{T} = YX^{T} \left( {XX^{T} } \right)^{ - 1}$$
(4)

where X = [x1,x2,…,xw], Y = [y1,y2,…,yw], and A = [a0,a1,…,a5].

The model constants are attained over the training step. According to the below criterion, the testing subset is utilized to identify the optimum combination of two independent variables70:

$$\delta_{j}^{2} = \mathop \sum \limits_{{i = 1 + N_{t} }}^{ \, N} \left( {y_{i} - P_{i}^{GMDH} } \right)^{2 } { < }\varepsilon \quad { ; }\quad \quad j = 1, 2, \ldots , \left( {\begin{array}{*{20}c} w \\ 2 \\ \end{array} } \right)$$
(5)

Mathematical models for calculation of nanofluids’ specific heat capacity

Old correlations presented in the previous studies

Two primary theoretical models have been established to approximate the specific heat capacity values of nanofluids. The first one was developed by Pak and Cho18 in 1998 based on the ideal gases’ mixing theory according to Eq. (6). Another one was made by Xuan and Roetzle75 in 2000 on the basis of the thermal equilibrium of nanoparticles and base-fluid according to Eq. (7).

$$C_{P,nf} = \phi_{v} C_{P,np} + \left( {1 - \phi_{v} } \right)C_{P,bf}$$
(6)
$$C_{P,nf} = \frac{{\phi_{v} \rho_{np} C_{P,np} + \left( {1 - \phi_{v} } \right)\rho_{bf} C_{P,bf} }}{{\phi_{v} \rho_{np} + \left( {1 - \phi_{v} } \right)\rho_{bf} }}$$
(7)

Besides these two models, several other empirical correlations have been generated by various researchers in recent studies. Since the full description of these studies was detailed in the literature review part of the present work, only the developed correlations and their validity conditions are reported here:

  • Vajjha and Das17:

    $$C_{P,nf} = A.\phi_{v} + B.\phi_{v} .T + C.\phi_{v} .T^{2} + D.\phi_{v} .T^{3}$$
    (8)

    where T is in Kelvin and the coefficients A, B, C, and D are quartic polynomial functions of ϕv. By applying Eq. (8), the specific heat capacity of Al2O3 nanoparticles dissolved in a 40:60 (wt%) water-ethylene glycol mixture can be calculated at volume concentrations of 2–10 vol% and a temperature range of 313–363 K.

  • Vajjha and Das19:

    $$\frac{{C_{P,nf} }}{{C_{P,bf} }} = \frac{{(A.T) + B.\left( {\frac{{C_{P,np} }}{{C_{P,bf} }}} \right)}}{{C + \phi_{v} }}$$
    (9)
  • Vajjha and Das20:

    $$\frac{{C_{P,nf} }}{{C_{P,bf} }} = \frac{{A^{\prime}.\left( {\frac{T}{{T_{^\circ } }}} \right) + B.\left( {\frac{{C_{P,np} }}{{C_{P,bf} }}} \right)}}{{C + \phi_{v} }}$$
    (10)

Equations (9) and (10) are applicable across temperatures of 315 < T(K) < 363 as well as volume fractions of 0 < ϕv(%) ≤ 0.07 for nanofluids containing ZnO and 0 < ϕv(%) ≤ 0.1 for nanofluids containing SiO2 and Al2O3, all dispersed in a mixture of water and ethylene glycol (40:60 wt%). Regarding the type of nanoparticles, Table 1 presents the constants of these two equations and the error values reported by the aforementioned authors.

Table 1 Characteristics of correlations (9) and (10) provided by Vajjha and Das19,20.
  • Sekhar and Sharma24:

    $$\frac{{C_{P,nf} }}{{C_{P,bf} }} = 0.8429\left( {1 + \frac{{T_{nf} }}{50}} \right)^{ - 0.3037} \left( {1 + \frac{{d_{np} }}{50}} \right)^{0.4167} \left( {1 + \frac{{\phi_{v} }}{100}} \right)^{2.272}$$
    (11)

    Equation (11) is valid for water-based nanofluids comprising CuO, SiO2, TiO2, and Al2O3 and in the ranges of 20 < Tnf (°C) < 50, 15 < dnp(nm) < 50, and 0.01 < ϕv(%) < 4.00. The error range of this correlation was reported between –8% and +10%.

  • Çolak et al.28:

    $$C_{P,nf} = \left( {A. \, T + B} \right)\left( {1 + C.\phi_{v}^{D} } \right)$$
    (12)

    where T is temperature (°C) and ϕv is volume fraction (vol%). This correlation, which has been developed for Cu–Al2O3/W hybrid nanofluid, has constants and error values shown in Table 2.

    Table 2 Characteristics of correlation (12) provided by Çolak et al.28.
  • Gao et al.29:

    $$C_{P,nf} = A + B.\phi_{m} + C.T + D.\phi_{m}^{2} + E.\phi_{m} .T + F.T^{2}$$
    (13)

Equation (13) has been generated based on experimental data of GrapheneOxide–Al2O3/W hybrid nanofluids, in which T and ϕm are temperature (°C) and mass fraction (wt%), respectively. Table 3 lists the constants and the error value related to this equation.

Table 3 Characteristics of correlation (13) provided by Gao et al.29.

New correlations developed in the present study

(a) The GRG-based correlation:

Using the solver tool available in the Data tab of Excel software, various mathematical relationships with different numbers of constants can be created through the implementation of the GRG technique. For this purpose, a correlation in the form of Eq. (14) was developed with the aim of minimizing the average absolute percent relative error (AAPRE) regarding input and output variables (for the total dataset). According to the average and mode values related to the nanoparticle size and temperature variables, available in the descriptive statistics presented in Table S2 (in the Supplementary File), the reference values for this equation were considered to be do = 50 nm and To = 300 K.

$$C_{P,nf}^{{\text{GRG}}} = C_{P,bf} .\left[ {a_{0} + a_{1 \, } .\left( {\frac{{d_{np} }}{{d_{^\circ } }}} \right)^{{a_{2} }} + a_{3 \, } .\left( {\frac{{\phi_{v} }}{100}} \right)^{{a_{4} }} + a_{5} .\left( {\frac{T}{{T_{^\circ } }}} \right)^{{a_{6} }} + a_{7 \, } .\left( {\frac{{C_{P,np} }}{{C_{P,bf} }}} \right)^{{a_{8} }} } \right]$$
(14)

where a0 = – 1.459532, a1 = 1.191867, a2 = – 0.044737, a3 = – 1.889018, a4 = 1.014473, a5 = 1.172350, a6 = 0.102400, a7 = 0.202981, a8 = 0.960041

The correlations obtained using meta-heuristic techniques are as follows.

(b) The GP-based correlation:

$$C_{P,nf}^{{\text{GP}}} = b_{0} + b_{1} .\left\{ {b_{2} {.}\phi_{v} - \left[ {b_{3} + \exp \left( {b_{4} {.}C_{P,bf} } \right)} \right].\log \left[ \begin{gathered} b_{5} + b_{6} {.}T + \left( {b_{7} .\frac{T}{{d_{np} }} + b_{8} .\frac{1}{{\phi_{v} }}} \right).C_{P,np} \hfill \\ - \exp \left( {b_{9} {.}C_{P,bf} } \right) \hfill \\ \end{gathered} \right]} \right\}$$
(15)

where b0 = – 2.124005, b1 = – 0.071654, b2 = 0.786437, b3 = 8.280478, b4 = 0.474517, b5 = 9.110290, b6 = 0.814494, b7 = 1.277965, b8 = 0.989456, b9 = 0.772331

(c) The GEP-based correlation:

$$C_{P,nf}^{{\text{GEP}}} = c_{0} - \left[ {\left( {c_{1} {.}\phi_{v} + c_{2} {.}C_{P,np} } \right) - (c_{3} + c_{4} {.}C_{P,np} ).\left( {c_{5} - c_{6} {.}\frac{{d_{np} {.}\phi_{v} }}{T}{.}C_{P,bf} } \right)} \right]{.}C_{P,bf} - \left( {c_{7} - c_{8} {.}T} \right){.}C_{P,bf}^{ \, 2}$$
(16)

where c0 = 0.327155, c1 = 2.615432 × 10–3, c2 = 2.191448 × 10–3, c3 = 0.0559709, c4 = 5.341911 × 10–3, c5 = 12.798509, c6 = 0.555788, c7 = 0.0123802, c8 = 1.3806004 × 10–4

(d) The GMDH-based correlation:

$$C_{P,nf}^{{\text{GMDH}}} = 0.045762 + 0.798159 \, d_{0} + 0.170415 \, d_{1} + 1.046540 \, d_{0} \, d_{1} - 0.614356 \, d_{0}^{ \, 2} - 0.426339 \, d_{1}^{ \, 2}$$
(17)

where

$$d_{0} = - 8.53143 + 5.2795 \, d_{2} - 0.804102 \, d_{2}^{ \, 2} + 0.0791396 \, d_{2} \, C_{P,bf} + 0.638003 \, C_{P,bf}$$
(17a)
$$d_{1} = - 0.645007 - 0.83654 \, d_{3} - 0.601428 \, d_{3}^{ \, 2} + 1.64147 \, d_{3} \, C_{P,bf} + 2.13504 \, C_{P,bf} - 1.07153 \, C_{P,bf}^{ \, 2}$$
(17b)
$$d_{2} = 4.02018 - 0.0248309 \, d_{np} - 2.45377 \times 10^{ - 3} \, d_{np} \, \phi_{v} + 2.32343 \times 10^{ - 4} \, d_{np}^{ \, 2} + 6.5419 \times 10^{ - 3} \, \phi_{v}^{ \, 2}$$
(17c)
$$d_{3} = 1.81744 + 1.19136 \, d_{4} - 1.44623 \, d_{5} - 0.0777568 \, d_{4} \, d_{5} + 0.290077 \, d_{5}^{ \, 2}$$
(17d)
$$d_{4} = 2.208 - 0.0779644 \, \phi_{v} - 0.372357 \, C_{P,bf} + 2.96343 \times 10^{ - 3} \, \phi_{v} \, C_{P,bf} + 1.60252 \times 10^{ - 3} \, \phi_{v}^{ \, 2} + 0.195263 \, C_{P,bf}^{ \, 2}$$
(17e)
$$d_{5} = - 8.27289 + 0.0301767 \, d_{np} + 0.0604513 \, T - 2.14838 \times 10^{ - 4} \, d_{np} \, T + 2.83985 \times 10^{ - 4} \, d_{np}^{ \, 2} - 6.68282 \times 10^{ - 5} \, T^{2}$$
(17f)

Results and discussion

Evaluation and comparison of the correlations

In the current research, four new correlations—Eqs. (14) to (17)—were attained with the aim of estimating the specific heat capacity of mono-nanofluids by executing the GRG, GP, GEP, and GMDH techniques. On the other hand, three previously proposed Eqs. (6), (11), and (12) were selected and implemented on the entire employed database. Then, in order to compare the estimated outcomes of all seven equations with their corresponding experimental values and to evaluate the performance and accuracy of those equations, the following statistical parameters for train and test subsets (including 1667 and 417 data-points, respectively) and the total database (with 2084 data-points) were computed and declared according to Tables 4 and 5.

Table 4 The statistical parameters of old correlations presented in the previous studies.
Table 5 The statistical parameters of new correlations developed in the current study.
  1. 1.

    Average Percent Relative Error (APRE) This parameter determines the relative deviation of estimated data-points from the associated experimental values and is computed as:

    $$APRE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left[ {\frac{{\left( {C_{P,nf,i} } \right)_{exp.} - \left( {C_{P,nf,i} } \right)_{est.} }}{{\left( {C_{P,nf,i} } \right)_{exp.} }} \times 100} \right]}$$
    (18)
  2. 2.

    Average Absolute Percent Relative Error (AAPRE) This is calculated similarly to APRE with the exception that the errors’ absolute values are considered when determining the final values of errors:

    $$AAPRE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{\left( {C_{P,nf,i} } \right)_{exp.} - \left( {C_{P,nf,i} } \right)_{est.} }}{{\left( {C_{P,nf,i} } \right)_{exp.} }} \times 100} \right|}$$
    (19)
  3. 3.

    Root Mean Square Error (RMSE) This parameter indicates the level of data scattering around the zero point and is computed as:

    $$RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left[ {\left( {C_{P,nf,i} } \right)_{exp.} - \left( {C_{P,nf,i} } \right)_{est.} } \right]^{2} } }$$
    (20)
  4. 4.

    Standard Deviation (SD) This parameter represents the amount of data scattering from the average value and is calculated as:

    $$SD = \sqrt {\frac{1}{n - 1}\sum\limits_{i = 1}^{n} {\left[ {\frac{{\left( {C_{P,nf,i} } \right)_{exp.} - \left( {C_{P,nf,i} } \right)_{est.} }}{{\left( {C_{P,nf,i} } \right)_{exp.} }}} \right]^{2} } }$$
    (21)
  5. 5.

    Coefficient of Determination (R2) This parameter evaluates how well the experimental and estimated values match, as follows:

    $$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left[ {\left( {C_{P,nf,i} } \right)_{exp.} - \left( {C_{P,nf,i} } \right)_{est.} } \right]^{2} } }}{{\sum\limits_{i = 1}^{n} {\left[ {\left( {C_{P,nf,i} } \right)_{exp.} - C_{P,nf,ave} } \right]^{2} } }} \quad { , }\quad \quad where\quad \, C_{P,nf,ave} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {C_{P,nf,i} } \right)_{exp.} }$$
    (22)

As seen in Table 4, Eq. (6) can provide relatively good estimations of the specific heat capacity for mono-nanofluids available in the database. Such a consequence has also been mentioned in the research of Vajjha and Das17,20 and Murshed31, but it contradicts the findings of Zhou and Ni76 and Zhou et al.77. However, Eq. (11) has not shown a good performance due to the parameter-related limitations implied in Ref.24 and the broad range of the variables existing in the current work. In addition, a poor and unacceptable performance is observed from Eq. (12) because it was obtained based on Cu–Al2O3/W hybrid nanofluids, while the database used in the present study only comprises mono-nanofluids. This result is owing to the dissimilarity of stability levels and thermophysical properties found in mono and hybrid nanofluids caused by their structure-related differences.

If the results of four novel correlations developed in the current study are compared with the estimations of the other three equations, it can be understood that these four correlations have very good accuracy in estimating the specific heat capacity of mono-nanofluids and have achieved acceptable error rates (AAPRE < 3.2% and R2 > 0.95). This issue is also depicted in Fig. 4 as a graphical analysis of the AAPRE and R2 parameters. When carefully observed, the best performance is found for correlations obtained using the GMDH and GEP techniques.

Figure 4
figure 4

Graphical comparison of AAPRE and R2 values for the introduced correlations.

Applying graphical error analysis is a suitable and useful tool for checking the performance of predictive models, especially when multiple various models are to be compared side by side. Figure 5 displays percent relative errors related to the estimated values of Cp,nf obtained from the GRG, GP, GEP, and GMDH techniques for the train and test subsets versus the experimental values of Cp,nf. In this graph, the data-points are scattered along a line with zero error to shed light on the error pattern of estimation techniques. A large accumulation of data-points around this line indicates that a technique is highly accurate.

Figure 5
figure 5

Percent relative error distribution against experimental values of CP,nf for different correlations: (a) GRG, (b) GP, (c) GEP, and (d) GMDH.

As illustrated in Fig. 5, a large percentage of the experimental data-points are overestimated by all techniques. In other words, a majority of the estimated values are greater than the experimental values, and therefore, the relative errors are negative numbers [see Eq. (18)]. However, Fig. 5d demonstrates that the GMDH technique overestimates a relatively small fraction of data-points (low to medium values of Cp,nf) and performs a credible task in evenly estimating the other data-points. Although this technique would be suitable for estimating just medium- and large-range values of Cp,nf, it still has almost an error tendency and needs to be utilized carefully.

According to Fig. 5d, the relative errors related to the CP,nf values estimated by the GMDH technique are located near the line with zero error (within the error range of −24.6% to +7.5%). This technique has made more accurate and reliable estimates than those of the other techniques.

In order to undertake a more thorough analysis regarding the accuracy and precision of the suggested correlations, another visual evaluation was performed by utilizing a particular kind of scatter chart known as the cross-plot. In this method, the estimated data-points from a correlation are put against the experimental data and across a 45° line (known as a unit-slope line) passing the plot’s origin. Hence, the effectiveness of each correlation can be judged by how close the trend is to the 45° line. The cross-plots related to the developed correlations for CP,nf estimation applying the GRG, GP, GEP, and GMDH techniques are presented in Fig. 6.

Figure 6
figure 6

Cross-plots of the correlations developed for CP,nf estimation: (a) GRG, (b) GP, (c) GEP, and (d) GMDH.

According to Fig. 6, a fairly packed accumulation of data-points along the unit-slope line is observed, particularly for the GMDH technique, whose outputs are in respectable agreement with the experimental data. Furthermore, the GRG technique has higher data deviations from the unit-slope line than the other techniques.

In Fig. 7, the group-error distribution diagrams in terms of five input variables are exhibited for three old and four novel correlations. This method first involves classifying the input or independent variables into a certain number of ranges (or categories) according to the scope of their changes, after which the estimation error values for the target variable are determined and plotted in each range.

Figure 7
figure 7

AAPRE distribution in different ranges of input variables for the suggested correlations.

As depicted by Fig. 7a regarding the “average particle size” parameter, the lowest AAPRE values correspond to the range of 52–66 nm for the GRG, GP, and GEP techniques and the range of 10–24 nm for the GMDH technique. As shown in Fig. 7b regarding the “particle volume fraction” parameter, the lowest AAPRE values are achieved in the range of 4–6 vol% for the GRG and GEP techniques and the range of 8–10 vol% for the GP, GEP, and GMDH techniques. As can be seen in Fig. 7c concerning the “temperature” parameter, the lowest AAPRE values are attained in the range of 373.15–413.15 K for the GRG and GMDH techniques, the range of 293.15–333.15 K for the GP technique, and the range of 333.15–373.15 K for the GEP technique. As indicated in Fig. 7d regarding the “nanoparticle specific heat” parameter, the lowest AAPRE values are seen in the range of 0.536–0.692 kJ/kg.K (mainly Si3N4, TiN, CuO, and TiO2) for the GRG technique, the range of 0.692–0.848 kJ/kg.K (mainly Al2O3 and SiO2) for the GEP technique, and the range of 1.004–1.160 kJ/kg.K (mainly MgO) for the GP and GMDH techniques. As can be observed in Fig. 7e concerning the “base-fluid specific heat” parameter, the lowest AAPRE values are found for all four techniques of the GRG, GP, GEP, and GMDH in the range of 2.210–2.632 kJ/kg.K (mainly EthyleneGlycol). It is worth noting that with a growth in the heat capacity values of the base-fluids, the AAPRE values computed for the correlation belonging to Çolak et al.28 [i.e., Eq. (12)] have decreased.

Figure 8 illustrates a graphical analysis of cumulative frequency for evaluating the performance of the different correlations mentioned in the present study. According to this plot, the cumulative frequency of all data-points is depicted versus the percent relative errors to measure the number of data-points that the correlations can reliably estimate. From Fig. 8, it can be inferred that the GMDH technique surpasses the other techniques by successfully and effectively estimating more than 90% of total data-points with a relative error of −10% to +5%. Furthermore, the benefit and priority of the new correlations developed in this study are clearly visible compared to the other correlations.

Figure 8
figure 8

Cumulative frequency against percent relative error for suggested correlations.

Outliers detection and applicability domain of a technique

A popular and effective method known as the “leverage statistical approach” is applied to find probable outliers (or unexpectedly aberrant data) and designate the applicability domain of the developed correlations78,79,80. For this purpose, this method generates a graph known as the “Williams plot” by determining two parameters, the Hat Matrix (H) and the Standardized Residuals (SR), utilizing the Eqs. (23) and (24) 80,81,82:

$$H = X\left( {X^{T} X} \right)^{ - 1} X^{T}$$
(23)

where the letter T indicates the transpose operator. X is a Nt × Ni matrix, where Nt and Ni depict the number of all data-points and technique inputs, respectively.

$$SR_{i} = \frac{{e_{i} }}{{RMSE\sqrt {1 - H_{ii} } }}$$
(24)

where RMSE refers to the root mean square error of the used technique, ei denotes the discrepancy of the i-th estimated data-point from its associated experimental value, and the Hat Indices Hii represent the diagonal components of the H matrix (i.e., Hii = diag(H)).

Williams plot visually displays three important zones associated with the existence of valid data, out-of-leverage data, and suspected data (outliers) by revealing the relation between Standardized Residuals and Hat Indices. Valid data-points can be found in the ranges −3 ≤ SR ≤ +3 and 0 ≤ H ≤ H*. The leverage values equal to −3 and +3 for the parameter SR are called the cut-off metrics. The parameter H*, also referred to as the critical leverage or warning leverage, has the value 3(Ni + 1)/Nt. The data-points that belong to the ranges −3 ≤ SR ≤ +3 and H > H* are recognized as “good high leverage” points, which are outside the applicability domain of the technique by which a correlation has been generated. The rest of the data-points with SR < −3 or SR > +3 (independent of H values) are classified as suspected data or outliers. Due to their negative effect on the slope and intercept of the constructed regression line, they are considered a risk to reliable modeling and are also called “bad high leverage” points78,79,80,81,82,83.

Figure 9 exhibits the Williams plot for the best-proposed technique (i.e., GMDH) based on the total dataset employed in the current study. This graph shows that the GMDH technique detects 1.68% of the 2084 data-points (35 red triangular marks) as probable outliers. Furthermore, just 0.62% of the 2084 data-points (13 blue square marks) are found to be outside the applicability domain of the GMDH technique. As a result, due to the existence of a substantial portion of valid data-points (green circular marks) in the ranges −3 ≤ SR ≤ +3 and 0 ≤ H ≤ 0.0086, the accuracy and reliability of this technique are confirmed again.

Figure 9
figure 9

Williams plot related to the correlation developed by the GMDH technique.

Sensitivity analysis

Performing sensitivity analysis can include several purposes, such as detecting errors existing in the model’s structure, determining the grade of relation between input and output variables of a model, and refining a model by discarding inputs that have little or no effect on its output. The relevancy factor (r) is a parameter that is used to assess how much every input variable influences the output of a model. This factor’s value can vary from −1 to +1, and the greater its absolute value for an input variable, the higher its impact on the model’s output30.

One of the ways to determine the relevancy factors is to use the Correlation option in the Data Analysis tools available in the Data tab of Excel software. The relevancy factors were calculated for every one of the techniques proposed in the present research, and similar results were discovered. Figure 10 represents the relevancy factors derived for each input variable according to the estimations of all techniques. Obviously, the parameters of temperature (T) as well as specific heat capacities of nanoparticle (CP,np) and base-fluid (CP,bf) have a direct relationship with nanofluids’ specific heat capacity (CP,nf), whereas two parameters of average particle size (dnp) and volume fraction (ϕv) do not. Additionally, this figure indicates that the two parameters CP,bf and dnp (with r factors of about +0.94 and −0.42, respectively) have significant impacts on estimating the target variable CP,nf. This issue was also discovered in the studies of Jamei et al.84,85, who investigated the specific heat capacity of various metal-oxide-based and carbon-based nanofluids dispersed in different base-fluids through the application of powerful machine learning techniques.

Figure 10
figure 10

Relevancy factors between input and output variables.

Analyses of statistical trends

Based on the relevancy factors outlined in the prior section, the inverse influence of average particle size (dnp) and volume fraction (ϕv) on the mono-nanofluids’ specific heat capacity (CP,nf) was revealed. In addition, three other variables comprising temperature (T), nanoparticle specific heat capacity (CP,np), and base-fluid specific heat capacity (CP,bf) had a direct proportional influence on the target variable CP,nf. The graphical trend analysis shown in Fig. 11 makes this issue quite evident. Here, only some experimental data and their corresponding values estimated by the best technique (i.e., GMDH) are provided.

Figure 11
figure 11

Variations of mono-nanofluids’ specific heat capacity with any input parameters.

Figure 11a depicts the decreasing trend of mono-nanofluid heat capacity (CP,nf) with growing nanoparticle size (dnp). This impact was also found in the investigations of Zhang et al.86, Angayarkanni et al.87, Xiong et al.88, Wang et al.89, and Novotny et al.90. The underlying cause of this matter can be assigned to the fact that smaller nanoparticles may have greater surface-to-volume ratios and atomic thermal vibrational energies86,88. However, it was discovered that the nanoparticle size had a negligible effect, according to the study conducted by Żyła et al.38 on the specific heat capacity of nanofluids comprising AlN, TiN, and Si3N4 dispersed in ethylene glycol. Such a finding was also reported in Satti et al.’s25 research.

Based on the experimental results of Murshed31, Fig. 11b depicts that the values of mono-nanofluid heat capacity (CP,nf) reduce as the volume fraction of nanoparticles (ϕv) increases. Furthermore, Satti et al.25 found that for small volume fractions (varying from 0.5% to 1.5%), the specific heat values of their examined nanofluids changed relatively little. In another study related to Raud et al.91, the specific heat values of TiO2/W and Al2O3/W nanofluids dropped linearly with an increase in the mass fraction of the utilized nanoparticles. The investigations of Raja et al.92 also disclosed that the specific heat capacity of nanofluids containing copper, silver, and aluminum nanoparticles suspended in water or ethylene glycol reduced as the nanoparticles’ concentration increased. The reason is that the solid nanoparticles have lower specific heat capacities in comparison with the base-fluids. Therefore, the more nanoparticles are dissolved in a certain amount of base-fluid, the more the specific heat capacity of the resultant nanofluid will decrease17,93. Similar outcomes were also attained in this field by Maghrabie et al.94, Moldoveanu and Minea27, Verma et al.35, and Zhou and Ni76.

In Fig. 11c, it is demonstrated how the specific heat value of the mono-nanofluids (CP,nf) varies with temperature (T) in such a way that there is a direct relationship between these two variables. As the temperature of the nanofluid rises, its ability to absorb and store thermal energy (without undergoing a phase transition) and as a result, the specific heat values increase. This behavior has been clearly noted in numerous studies, including Gao et al.29, Wole-Osho et al.39, Raud et al.91, Popa et al.26, Angayarkanni et al.87, Elias et al.33, Barbés et al.21,22, Heyhat et al.95, and Vajjha and Das17,19, 20.

Figures 11d and e illustrate the trend of mono-nanofluids’ specific heat changes against the specific heat capacities of nanoparticles and base-fluids, respectively. The direct relationship of CP,nf variations with CP,np and CP,bf is presented for some experimental data. According to Fig. 11d, TiN-containing nanofluids have a greater specific heat capacity than those containing Si3N4 (taken from the experimental data of Żyła et al.38) due to the higher specific heat value of TiN nanoparticles, when considering a certain base-fluid (in this case, ethylene glycol) and keeping other parameters constant. Such an effect was also reported in the experimental study of Vijayakumar et al.37 regarding the difference in specific heat values of CuO/W and Al2O3/W nanofluids. As seen in Fig. 11e concerning the experimental data of Cabaleiro et al.23, since the specific heat value of the water-ethylene glycol mixture is greater than that of the ethylene glycol alone, the specific heat capacity of nanofluids comprising ZnO suspended in the W+EG mixture has been raised. Moreover, Akilu et al.36 came to a similar conclusion by evaluating the specific heat changes of SiO2 nanoparticles dissolved in three different base-fluids (ethylene glycol, glycerol, and ethylene glycol+glycerol). According to Fig. 11e, the base-fluid type has a considerable effect on the specific heat capacity of a nanofluid. As previously declared, the relevancy factor corresponding to the variable Cp,bf had the highest value (+0.94).

Conclusions

Four soft-computing techniques were implemented on a database including 2084 data-points taken from 19 experimental studies in order to estimate the specific heat capacity of diverse mono-nanofluids (CP,nf) as a function of five input variables (dnp, φv, T, CP,np, and CP,bf). Then, four distinct high-accuracy correlations were developed, and numerous statistical and graphical error analyses were also conducted. The performance of these novel correlations was compared to the outcomes of three earlier correlations published by other researchers. The main advantage of these novel empirical correlations is that they have significantly reduced the constraints of prior correlations. As a result, they are adaptable to various oxide-based mono-nanofluids for a broad range of independent variable values.

According to the outcomes of each innovative and robust correlation developed in this research, the utilized techniques can be rated based on their accuracy as GMDH > GEP > GP > GRG. The results revealed that the correlation achieved using the GMDH technique is the most effective and appropriate one (with AAPRE = 2.4163% and R2 = 0.9743). However, the GEP technique has also exhibited acceptable behavior that is comparable. By applying the leverage statistical approach for the GMDH technique and distinguishing valid data, out-of-leverage data, and suspected data (outliers), the reliability and accuracy of this technique were confirmed. To recognize the relation between input and output variables available in the entire dataset, a sensitivity analysis was performed and the relevancy factors were determined for the best technique (GMDH). It was found that average nanoparticle size and base-fluid specific heat capacity are significant and influencing factors in establishing the specific heat values of mono-nanofluids. Ultimately, the variation trend of the target variable (CP,nf) for each of the input variables was graphically evaluated using the experimental data obtained from the previous studies and the estimated data generated by the GMDH technique. The findings of this study agree with many experimental measurements and computer-based surveys in calculating the specific heat capacity of mono-nanofluids.