Introduction

Basic concepts of data clustering

An unsupervised learning technique called clustering separates a database into clusters of identical matters by reducing the resemblance among objects in diverse groups and exploiting the similarities between entities in the identical cluster1,2,3. Clustering has been a crucial tool for data analysis in many disciplines, including intrusion detection, data mining, bioinformatics, and machine learning systems. Clustering is also used in various fields, including social network analysis, robotics, and networks. Hierarchical clustering, mixed clustering, learning network clustering, and partition clustering can all be used to group data into clusters4,5,6. The main objective of clustering techniques is to make clusters more homogeneous and heterogeneous. Partitional clustering techniques previously ran into issues such as responsiveness to the starting center points, local optima trap, and lengthy run times. Clustering separates a set of datasets with \(d\)-dimensions into \(k\) different groups. Every division is known as a cluster \({C}_{i}\). Each cluster's members share several traits in common, although there is little overlap between them7,8,9. In this situation, clustering's objective is to identify the separate groups and allocate objects depending on how closely they resemble the appropriate groups. The absence of initial tags for observations is the primary distinction between the clustering and classification methods. However, classification techniques use predetermined classifications to which objects are given, clustering groups of objects without previous knowledge10,11,12,13.

Numerous research efforts on data clustering have been offered throughout the past decades. To cluster a dataset, there are various solutions to the clustering problem. These methods primarily use complicated network approaches, \(K\)-means, and its improved variants, metaheuristic algorithms, and other methods14,15,16,17,18. One of the most well-known of these methods is the \(K\)-means algorithm, which attempts to partition a complete dataset into \(k\) clusters by randomly selecting \(k\) data points as starting cluster centers. The \(K\)-means method, however, is delicate to the choice of beginning points and might be unable to group huge databases. Many experts have concentrated on swarm intelligence algorithms to address the drawbacks of the K-means technique, which can do a simultaneous search in a complicated search space to avoid a premature convergence trap3,15,19,20,21,22,23,24. Researchers also focused on merging metaheuristic techniques with conventional clustering techniques to lessen such limitations.

Literature review

Metaheuristics algorithms are population-based and imitate the shrewd behaviour of socially organized creatures. Glowworm and crow search-based clustering have been proposed for data clustering. These techniques depicted the clustering solutions as swarms of creatures. Then, to quickly cover the search space, these strategies use clever intensification and diversification techniques. Although metaheuristic algorithms reduced classic clustering techniques' execution times, they still had drawbacks25. The shortcomings of Particle Swarm Optimization (PSO) and its competitors can be summarized as follows: lack of developed memory elements and the diversity of populations26. The PSO and its variants use a single optimal solution stored in the solution space to reposition the members of the swarm, which can cause them to become stuck in local minima. These shortcomings caused PSO and its variants to obtain solutions with low quality and convergence speed, which accounts for the birth of numerous other algorithms reported in the literature13,27,28.

The authors of Ref.29 suggested a genetic algorithm (GA) for clustering that exploits a region-based crossover mechanism. It finds the best preliminary center for the \(k\)-means algorithm. The chromosomes translate the clusters of centroids, and during the crossover, chromosome pairs exchange several centroids located in the same area of space. According to experimental research, the region-based crossover outperforms a random exchange of centroids. The authors of Ref.2 suggested a differential evolution (DE) algorithm integrating the \(k\)-means technique. The local search and initial population are conducted using the \(k\)-means algorithm. The population vectors set the cluster centroids. In an additional effort to eliminate the redundant nature of the centroid encoding, The authors of Ref.30 reported a technique that hybridized the \(k\)-means algorithm and gravitational search algorithm. The \(k\)-means was used to improve the generation of the initial population; one individual was generated using \(k\)-means, and the remaining individuals were generated randomly. A data clustering approach based on the Gauss chaotic map-using PSO was presented in Ref.31. Sequences produced by the Gauss chaotic drift were used to replace the random elements affecting the velocity update's cognitive and social components. A cooperative artificial bee colony (CABC) approach for data clustering was proposed in Ref.32, in which every bee contributes to creating the optimal solution. The appropriate solution for each bee is thought to replace every solution of the optimal solution. The authors of Ref.33,34 used representative points, typically not centroids, to indicate potential solutions, and, as with centroids, a dataset partition was created by allocating data to the cluster closest to the representative point.

Many metaheuristic algorithms have recently been reported in addition to the above-discussed algorithms for numerical and real-world engineering design optimization problems, including data clustering. For instance, ant colony optimization35, firefly algorithm36,37, flower pollination algorithm38, grey wolf optimizer (GWO)39,40,41,42, Jaya algorithm43, Teaching–learning based optimization (TLBO) algorithm44, Rao algorithm45, political optimizer 46, whale optimization algorithm (WOA)47, Moth flame algorithm (MFO)48, multi-verse optimizer (MVO)49, Salp swarm algorithm (SSA)50,51, spotted hyena optimizer52, butterfly optimization53, lion optimization54, fireworks algorithm55, Cuckoo search algorithm56, bat algorithm57, Tabu search58, harmony search algorithm59, Newton–Raphson optimizer60, reptile search algorithm61, slime mould algorithm62,63, harris hawk optimizer64, Chimp optimizer65, artificial gorilla troop optimizer66, atom search algorithm67, marine predator algorithm68,69, sand cat swarm algorithm70, equilibrium optimizer71,72, Henry gas solubility algorithm (HGSA)73, resistance–capacitance algorithm74, arithmetic optimization algorithm75, quantum-based avian navigation optimizer76, multi trail vector DE algorithm10,77, arithmetic optimization algorithm78, starling murmuration optimizer79, atomic orbit search (AOS)80, subtraction-average-based optimizer81, etc. are reported for solving optimization problems. In conclusion, these new algorithms and their improved variations based on different metaheuristic computing algorithms yield greater results than before82,83,84. A comparative study to show the recent efforts in using metaheuristic algorithms for data clustering is listed in Table 1.

Table 1 Summary of a few metaheuristic algorithms applied to data clustering.

The authors of Ref.88 proposed an improved version of the firefly algorithm by hybridizing the exploration method and a chaotic local search approach. The improved firefly algorithm was practically validated for routinely choosing the optimal dropout rate for the regularization of the neural networks. In order to maximize the local and global characteristics collected from each of the handwritten phrase representations under consideration, a hierarchical feature selection framework based on a genetic algorithm has been developed in Ref.89. The authors of Ref.90 have reviewed the PSO algorithm and its variants for medical disease detection. The overfitting problem was addressed in Ref.91 by using the sine-cosime algorithm to determine an appropriate value for the regularisation parameter dropout. According to the literature review, swarm strategies are currently being utilized effectively in this domain, although their future application has not yet been fully explored. The effective extreme gradient boosting classification algorithm, which is used to classify the characteristics obtained by the convolutional layers, was used to substitute the fully connected layers of a standard convolution neural network in order to increase classification accuracy92. Furthermore, to support the suggested research, a hybrid version of the arithmetic optimization method is constructed and used to optimize the extreme gradient boosting hyperparameters for COVID-19 chest X-ray pictures. To solve the problem of early convergence, this study introduces a novel variant known as the adaptive seagull optimization algorithm. The performance of the suggested algorithm is improved by increasing the seagulls' inclination towards exploratory behaviour93. Qusai-random sequences are employed for the population initialization in place of a random distribution in order to increase the variety and convergence factors. The authors of Ref.94 proposed an enhanced PSO algorithm that uses pseudo-random sequences and opposing rank inertia weights instead of random distributions for initialization to improve convergence speed and population diversity. The authors also introduced a new initialization population approach that uses a quasi-random sequence to initialize the swarm and generates an opposing swarm using the opposition-based method. For fifteen UCI data sets, the suggested technique optimized feed-forward neural network weight.

Research gaps

The GWO is one of the well-known metaheuristic algorithms95. This algorithm draws inspiration from the hunting behaviour of grey wolves and the hierarchical leadership model. The GWO has been implemented, and the results have been encouraging enough to warrant additional research. Investigators can improve the issues of low precision and slow convergence speed96. As a result, studies utilized various techniques to increase optimizers' efficiency and address optimization issues. For instance, an improved GWO was suggested to adjust the recurrent neural network's parameters. To allow faster GWO convergence, chaotic GWO was introduced. Researchers have also employed numerous techniques to enhance GWO97,98,99. The literature review has been extensively augmented to underscore the existing research gaps within the context of optimization algorithms applied to data clustering, with a specific focus on the limitations of current metaheuristic algorithms, including the traditional GWO. Despite GWO's proven effectiveness in various optimization tasks, its application to data clustering reveals critical shortcomings, primarily its struggle with premature convergence and its inability to maintain a balance between exploration and exploitation phases. These limitations significantly affect the quality of clustering outcomes, especially in complex datasets with high dimensionality or noise. Moreover, while existing studies have explored numerous enhancements to GWO and other metaheuristic algorithms, there remains a distinct gap in the literature regarding the integration of these algorithms with classical clustering techniques, such as K-means, to address these specific challenges. This gap highlights the need for innovative approaches that can leverage the strengths of both metaheuristic optimization and traditional clustering methods to achieve superior clustering performance. The proposed \(K\)-means Clustering-based Grey Wolf Optimizer (KCGWO) aims to fill this gap by introducing a hybrid algorithm that combines the adaptive capabilities of GWO with the efficiency of K-means clustering. This combination is designed to enhance diversity, prevent premature convergence, and ensure a more effective balance between the exploration of new solutions and the exploitation of known good solutions. However, the literature review reveals that while there are various attempts to improve the clustering process through algorithmic enhancements, the specific approach of blending GWO with K-means, complemented by a dynamic weight factor to adjust exploration and exploitation dynamically, is notably absent. This research gap signifies an opportunity for the KCGWO to contribute significantly to the field, offering a novel solution that addresses both the limitations of traditional GWO in clustering tasks and the need for more effective hybrid algorithms. By clarifying these gaps and positioning the KCGWO within this context, the revised related works section establishes a strong foundation for the significance and novelty of the proposed research.

Need for the proposed algorithm

In addition to the metaheuristic algorithm, the primary goal of the data mining process is to gather data from a big data set. The data can then be translated into a clear format for further usage. Clustering is a popular experimental data analysis tool. Objects are arranged using clustering so that each cluster contains more comparable objects. As discussed earlier, various cluster methods have been created to group the data. \(K\)-means is an example of a partitioning clustering algorithm because it operates based on the cluster centroid15,19. Numerous uses of the \(K\)-means cluster have been documented. In addition to enhancing the reliability of wireless sensor networks, \(K\)-means clustering was also used for image segmentation. Additionally, as an unsupervised learning technique, the \(K\)-means cluster has been frequently utilized to categorize data with no labels. The primary objective is to propose another variant of GWO called KCGWO to solve complex optimization problems, including data clustering problems. In this study, the KCGWO is proposed as an advanced solution to the inherent limitations of the GWO in addressing data clustering challenges. The GWO, while innovative in mimicking the social hierarchy and hunting tactics of grey wolves, exhibits deficiencies in exploration and exploitation—key factors for effective clustering. The proposed KCGWO method enhances GWO by incorporating the K-means algorithm and introducing a dynamic weight factor, aiming to improve the algorithm's performance significantly. The methodology of KCGWO unfolds in two pivotal enhancements over the traditional GWO. First, the integration of the K-means algorithm serves as an initial refinement step. Before the optimization process, K-means is applied to the dataset to establish a preliminary grouping of data points. This step ensures that the starting positions of the grey wolves (solutions) are closer to potential optimal solutions, thereby enhancing the exploration phase of GWO. The initial clustering helps in guiding the wolves towards promising areas of the search space from the onset. Second, a dynamic weight factor is introduced to adjust the influence of exploration and exploitation dynamically throughout the optimization process. This weight factor varies the wolves' movements, allowing for a more flexible search strategy that can adapt based on the current state of the search. It enables the algorithm to maintain a balance between exploring new areas and exploiting known promising regions, thus preventing premature convergence to suboptimal solutions. The performance of KCGWO was evaluated through extensive testing on numerical benchmarks and real-world datasets, demonstrating its superior capability to efficiently navigate the solution space and locate optimal cluster centers swiftly. This effectiveness is attributed to the synergistic combination of K-means for initial solution enhancement and the dynamic weight factor for maintaining an optimal balance between exploration and exploitation. Overall, KCGWO represents a significant advancement in solving data clustering problems, offering a robust and reliable method that overcomes the limitations of GWO. Its innovative approach to integrating K-means with a dynamic adjustment mechanism ensures high-quality solutions, making it a valuable tool for data analytics and clustering applications.

The primary contributions of this study are discussed as follows:

  • A new variant of GWO called KCGWO based on \(K\)-means clustering algorithm and weight factors in the position update is proposed.

  • Formulation fitness function for data clustering problem of a machine learning systems.

  • The performance of the KCGWO is validated using 10 numerical test functions and data clustering problems using eight real-world data sets with different dimensions.

  • The performance comparison is made with other well-known algorithms based on the statistical data analysis and statistical Friedman's ranking test (FRT).

The paper has been structured as follows. Section "Data clustering and problem statement" discusses the data clustering concepts and the formulation of the fitness function for the same problem. Section "Proposed K-means clustering grey wolf optimizer" comprehensively presents the formulation of the proposed KCGWO based on the \(K\)-means clustering algorithm; in addition, the basic concepts of GWO are also discussed. The results are comprehensively discussed in Section "Results and discussions", and Sect. "Conclusions" concludes the paper with a future study.

Data clustering and problem statement

The basic objective of data mining techniques is to obtain features from huge volumes of data. Such techniques use data processing techniques to find interesting patterns in huge amounts of data. Clustering, classifications, detecting anomalies, detecting deviations, synthesizing, and regression are a few examples of data analysis techniques. Data clustering is dividing a set of information into smaller groups where the similarities between the individuals in each group are high while those between the data in other groups are low. Distance metrics like Euclidean distance, Chord distance, and Jaccard index are used to assess how similar individuals of a subset are to one another. In principle, clustering algorithms can be divided into two groups: partitional and hierarchical, depending on how clusters are created and maintained86. A tree that depicts a sequence of clusters is produced in hierarchical clustering with no background knowledge of the number of groups or dependence on the initial state. Nevertheless, because they are static, an entity allocated to one cluster cannot be moved to another. Hierarchical algorithms' main drawback is this. The incompetent clustering of overlapping clusters could also be due to a lack of planning regarding the number of clusters. On the other hand, partitional clustering divides items into clusters of a predetermined size. Various techniques for partitional clustering aim to increase the dissimilarity of members belonging to distinct clusters while attempting to reduce the difference between objects in each cluster27,100,101,102.

Typically, the Euclidian distance is used to measure similarity. In this work, the distance between any two objects (\({o}_{i}\) and \({o}_{j}\)) inside the cluster is also determined using the Euclidean distance measure. Typically, it could be expressed as follows103:

$$D\left({o}_{i},{o}_{j}\right)=\parallel {o}_{i}-{o}_{j}\parallel =\sqrt{\sum_{m=1}^{d}{\left({o}_{im}-{o}_{jm}\right)}^{2}} ,$$
(1)

where \({o}_{i}\) and \({o}_{j}\) denote two distinct objects inside the cluster, \(d\) denotes the number of features for the entity, and partition clustering can be transformed into an optimization model based on the similarity metric, and this model can be explained as follows:

$$\underset{Z,W}{{\text{Minimize}}}: f\left(Z,W\right)=\sum_{k=1}^{K}\sum_{i=1}^{n}{w}_{ik}D\left({x}_{i},{z}_{k}\right) ,$$
(2)

Subjected to:

$$\left\{\begin{array}{ccc}\sum_{k=1}^{K}{w}_{ik}=1,& i=\mathrm{1,2},\dots ,n;& \\ {w}_{ik}\in \left\{\mathrm{0,1}\right\},& \forall i\in \left\{\mathrm{1,2},\dots ,n\right\},k\in \left\{\mathrm{1,2},\dots ,K\right\}& \end{array}\right.$$
(3)

where \(n\) denotes the sample size, \(K\) denotes the cluster size, and \({x}_{i}\) signifies the coordinates of the \(i\) th object in the current datasets. The term \({w}_{ik}\) indicates whether the \(i\) th object is clustered into the \(k\) th cluster or not, and \(D\left({x}_{i},{z}_{k}\right)\) indicates the length between the \(i\) th object and the center of the \(k\) th cluster. Noteworthy is the fact that the following is used to observe the same.

$$W=\{{w}_{ik}|i=\mathrm{1,2},\dots ,n,k=\mathrm{1,2},\dots ,K$$
(4)

A sample's partition criteria determine the amount of \({w}_{ik}\) in Eq. (3). Obtain an object partition that meets Eq. (3) for given a sample set \(X=\left\{{x}_{1},{x}_{2}, \dots ,{x}_{n}\right\}\).

$$\left\{\begin{array}{ccc}\sum_{i=1}^{K}{C}_{i}=X;& & \\ {C}_{i}\cap {C}_{j}=\phi ,& \forall i,j\in \left\{\mathrm{1,2},\dots ,K\right\}\wedge i\ne j;& \\ {C}_{i}\ne \phi ,& \forall i\in \left\{\mathrm{1,2},\dots ,K\right\}& \end{array}\right.$$
(5)

where \({C}_{i}(i=\mathrm{1,2},\dots ,K)\) is the \(i\) th cluster's object set, and the following equation can be used to identify its members:

$$\left\{\begin{array}{ccc}{C}_{i}=\left\{{x}_{k}|\parallel {x}_{k}-{z}_{i}\parallel \le \parallel {x}_{k}-{z}_{p}\parallel ,{x}_{k}\in X\right\},& p\ne i,p=\mathrm{1,2},\dots ,K;& \\ {z}_{i}=\frac{1}{\left|{C}_{i}\right|}\sum_{{x}_{k}\in {C}_{i}}{x}_{k},& i=\mathrm{1,2},\dots ,K& \end{array}\right.$$
(6)

where \({z}_{i}\) is frequently employed in the \(k\)-means clustering method, symbolizes a new center of cluster \(i,\) and \(\parallel .\parallel\) indicates the Euclidean distance between any two items in the subset.

Proposed K-means clustering-based grey wolf optimizer

This section briefs the original concepts of the basic Grey Wolf Optimizer (GWO) with its mathematical modelling. The proposed K-means Clustering-based Grey Wolf Optimizer (KCGWO) is discussed comprehensively.

Grey wolf optimizer

The grey wolf optimization algorithm is the most contemporary breakthrough in the field of metaheuristic optimization and was initially devised in Ref.95. GWO mimics the hunting actions of grey wolves in the wild, a supporting approach they use to chase their prey. The framework of the GWO seems quite distinct compared to other meta-heuristic optimization in that it uses three optimal specimens as the basis for a complex search procedure. These three optimal specimens are an alpha wolf \(\alpha\) that serves as the pack leader, a beta wolf \(\beta\) that provides support to the leader, and a delta wolf \(\delta\) that follows the leader and the loyal wolves. The last kind of wolf is termed omega wolf \(\omega\). Such wolves have varying degrees of responsibility and can be arranged in a hierarchy, with \(\alpha\) being the highest level and the first solution, \(\beta\), \(\delta\), and \(\omega\) representing the second, third, and final solutions, correspondingly. Thus, the three wolves mentioned above serve as inspiration for omegas. All species of wolves employ the three separate coefficients utilized to implement the encircling process to attempt to encompass the prey when they have located it. Three wolves evaluate the potential location of prey during the iterative search strategy. Based on Eqs. (7), (8), the positions of the wolf are updated during the optimization procedure.

$$\overrightarrow{D}=\left|\overrightarrow{C}\cdot \overrightarrow{{X}_{P}}\left(t\right)-\overrightarrow{X}\left(t\right)\right| ,$$
(7)
$$\overrightarrow{X}\left(t+1\right)=\overrightarrow{{X}_{P}}\left(t\right)-\overrightarrow{A}\cdot \overrightarrow{D} ,$$
(8)

where \(t\) is the current iteration, \(\overrightarrow{C}\) and \(\overrightarrow{A}\) are coefficient vectors, \(\overrightarrow{{X}_{P}}\) signifies the prey's position, and \(\overrightarrow{X}\) signifies the wolf position. The vectors \(\overrightarrow{C}\) and \(\overrightarrow{A}\) are as follows:

$$\overrightarrow{A}=2a\cdot \overrightarrow{{r}_{1}}-\overrightarrow{a} ,$$
(9)
$$\overrightarrow{C}=2\cdot \overrightarrow{{r}_{2}} ,$$
(10)

where \(\overrightarrow{{r}_{1}}\) and \(\overrightarrow{{r}_{2}}\) signify random vectors in the range \([0, 1]\), and factor \(a\) linearly falls from 2 to 0 with the number of iterations. The wolf at a location can change its position about the prey using the aforementioned updating algorithms. By changing the random parameters \(\overrightarrow{A}\) and \(\overrightarrow{C}\), it may be made to move to any location in the continuous space close to prey. The GWO considers that the prey's position is likely in the alpha, beta, and delta positions. During searching, the best, second-best, and third-best individuals found so far are recorded as alpha, beta, and delta. Omega wolves, on the other hand, change their sites in accordance with alpha, beta, and delta wolf populations.

$$\left.\begin{array}{c}{\overrightarrow{D}}_{\alpha }=\left|\overrightarrow{{C}_{1}}\cdot \overrightarrow{{X}_{\alpha }}-\overrightarrow{X}\right|\\ {\overrightarrow{D}}_{\beta }=\left|\overrightarrow{{C}_{2}}\cdot \overrightarrow{{X}_{\beta }}-\overrightarrow{X}\right|\\ {\overrightarrow{D}}_{\delta }=\left|\overrightarrow{{C}_{3}}\cdot \overrightarrow{{X}_{\delta }}-\overrightarrow{X}\right|\end{array}\right\} .$$
(11)

The location vectors for \(\alpha\), \(\beta\), and \(\delta\) are, respectively, \(\overrightarrow{{X}_{\alpha }}\), \(\overrightarrow{{X}_{\beta }}\), and \(\overrightarrow{{X}_{\delta }}\). The vectors \(\overrightarrow{{C}_{1}}\), \(\overrightarrow{{C}_{2}}\), and \(\overrightarrow{{C}_{3}}\) were produced randomly, and \(\overrightarrow{X}\) indicates the current position vector. The distances between the position of the current person and those of alpha, beta, and delta are calculated, respectively, by Eq. (11). To determine the present person's final position matrices, the following are described.

$$\left.\begin{array}{c}\overrightarrow{{X}_{1}}=\overrightarrow{{X}_{\alpha }}-\overrightarrow{{A}_{1}}\cdot \left(\overrightarrow{{D}_{\alpha }}\right)\\ \overrightarrow{{X}_{2}}=\overrightarrow{{X}_{\beta }}-\overrightarrow{{A}_{2}}\cdot \left(\overrightarrow{{D}_{\beta }}\right)\\ \overrightarrow{{X}_{3}}=\overrightarrow{{X}_{\delta }}-\overrightarrow{{A}_{3}}\cdot \left(\overrightarrow{{D}_{\delta }}\right)\end{array}\right\} ,$$
(12)
$$\overrightarrow{X}\left(t+1\right)=\frac{\overrightarrow{{X}_{1}}+\overrightarrow{{X}_{2}}+\overrightarrow{{X}_{3}}}{3} ,$$
(13)

where \(\overrightarrow{{A}_{1}}\), \(\overrightarrow{{A}_{2}}\), and \(\overrightarrow{{A}_{3}}\) denote vectors that are randomly created, and \(t\) signifies the current iteration. The regulating factor that modifies the coefficient \(\overrightarrow{A}\) is variable \(a\). This tactic aids the population in deciding whether to pursue or flee its prey. As a result, if \(|A|\) has a value greater than 1, the wolf is trying to find new search spaces. However, the wolf could pursue and attack the prey if the value of \(|A|\) is smaller than 1. The grey wolf starts to prevent any motion of the prey from attacking it once the hunting is accomplished adequately. This technique is accomplished by lowering the value of \(a\), which is in the range of 2 and 0. The value of an also decreases the value of \(\overrightarrow{A}\), which now falls between [− 1, 1]. The pseudocode of the GWO is provided in Algorithm 1.

Algorithm 1
figure a

Pseudocode of grey wolf optimizer.

K-means clustering-based grey wolf optimizer

In addition to the significance of optimization techniques, data analysis is a key research area. Clustering has thus been utilized as one of the data exploration approaches to gain a general understanding of the data's architecture. \(K\)-means is the most used unsupervised algorithm. Data points in each group resemble each other much more than those in other clusters. The method does a sequence of operations to identify unique subsets, which are discussed below.

  • The number of subsets is the primary criterion for the K-means, which in data mining starts with the initial set of random centroids chosen for each cluster.

  • The following step involves determining the Euclidean distance from the center to every data point in specific information set to connect each data with the closest point.

  • Continue performing until there is no modification in the centroids if K centres shift during the iteration.

The algorithm attempts to minimize the squared error function or objective function presented in Eq. (14).

$$\sum_{i=1}^{C}\sum_{j=1}^{{C}_{i}}{\Vert {x}_{i}^{j}-{C}_{j}\Vert }^{2} ,$$
(14)

where \({x}_{i}^{j}\) signifies the data points of \(i\) th cluster, \({C}_{j}\) denotes the size of the cluster center, and \(\Vert {x}_{i}^{j}-{C}_{j}\Vert\) represents the Euclidean distance between \({C}_{j}\) and \({x}_{i}^{j}\). The initialization of the proposed KCGWO is similar to the original version of GWO. \(K\)-means is utilized to separate the grey wolf population into three groups. The objective function value is then determined for each cluster/population individually104. The population has been divided into three clusters based on a random integer. If the random number is greater than 0.5, KCGWO uses population clusters based on the fitness values of each cluster. All the clusters' fitness values are compared within the condition. The population position is equal to cluster position 1, position 2, or position 3 based on the conditions provided in the pseudocode. However, KCGWO operates on the actual population without clustering if the random value is less than or equal to 0.5. Therefore, this feature can be utilized with different methods, but it needs to be evaluated to ensure it functions well. However, \(K\)-means are utilized in this study to enhance the effectiveness of GWO. The proposed KCGWO tries to compute the fitness for each population after selecting a particular population with/without clustering until it discovers the best fitness. Equations (7)(12) determine the optimum search agents. Equation (13) is then used to update each position. However, the weightage is not provided for the wolf hierarchy. Therefore, in Eq. (13), weight factors are introduced to improve the solution quality105. The modified position update equation is provided in Eq. (15).

$$\overrightarrow{X}\left(t+1\right)=\frac{3\overrightarrow{{X}_{1}}+\overrightarrow{2{X}_{2}}+\overrightarrow{{X}_{3}}}{6} ,$$
(15)

The variables \(a\), \(\overrightarrow{A}\) and \(\overrightarrow{C}\) are updated for the subsequent iteration. As a result, the iteration's best fit is chosen. Finally, the best fitness and position are returned. Figure 1 illustrates the flowchart of the proposed KCGWO algorithm. Algorithm 2 depicts the pseudocode of the KMGWO algorithm.

Figure 1
figure 1

Flowchart of the proposed algorithm.

Application of the proposed KCGWO to data clustering

A crucial stage in every metaheuristic approach is solution encoding. Each solution (grey wolf) represents all the cluster centers. These solutions are first produced randomly. However, the best position at each iteration of the KCGWO serves as a guide for the remaining grey wolves.

Each answer is an array of size \(d\times k\), with \(d\) being the total number of characteristics for each data and \(k\) being the total clusters. Figure 2 displays a pack of grey wolves representing the solutions. The fitness function is the total intra-cluster distance. The fitness function must be minimized to discover the best cluster centers using KCGWO. It is preferred to reduce the sum of intra-cluster distances96. In Eq. (16), the cluster center is defined, and Eq. (17) defines the distances between cluster members.

$${Y}_{j}=\frac{1}{{n}_{j}}\sum_{\forall {x}_{p}\in {C}_{j}}{x}_{p} ,$$
(16)
$${\text{Distance}}\left({x}_{p}-{y}_{j}\right)=\sqrt{\sum_{i=1}^{a}{\left({x}_{p}-{y}_{j}\right)}^{2}}$$
(17)

where \({y}_{j}\) denotes cluster center, \({x}_{p}\) denotes the position of the \(p\) th cluster member, \(a\) denotes the number of features of the dataset, \({n}_{j}\) denotes the members in the cluster \(j\), and \({C}_{j}\) denotes the cluster member \(j\).

Figure 2
figure 2

Illustration of solution encoding96.

Algorithm 2
figure b

Pseudocode of the proposed KCGWO.

Computational complexity

The computational complexity of the KCGWO is discussed as follows: (i) The proposed algorithm necessitates \(O(N\hspace{0.17em}\times \hspace{0.17em}dim)\), where \(N\) denotes the number of search agents, i.e. population size and \(dim\) denotes the problem dimension, (ii) the control parameters of KCGWO necessitates \(O(N\hspace{0.17em}\times \hspace{0.17em}dim)\), (iii) the position update of the KCGWO necessitates \(O(N\hspace{0.17em}\times \hspace{0.17em}dim)\), and (iv) fitness values of each population and cluster necessitate \(O(N\hspace{0.17em}\times \hspace{0.17em}dim\times n)\), where \(n\) denotes the number of clustered population. Based on discussions, the complexity of KCGWO for each iteration is \(O(N\hspace{0.17em}\times \hspace{0.17em}dim\times n)\), and finally, the total complexity of the proposed KCGWO algorithm is \(O(N\hspace{0.17em}\times \hspace{0.17em}dim\times n\times Max\_it)\), where \(Max\_it\) denotes the maximum of iterations.

Results and discussions

The original GWO is improved by employing the K-means clustering concept along with the weight factor, and it has been tested using 10 benchmark numerical functions, which have both unimodal and multimodal features. In addition, the performance is also validate for data clustering problems. The performance of the proposed KCGWO is compared with four other algorithms, such as MFO, SSA, MVO, ASO, PSO, JAYA, and the original GWO algorithm. The population size is 30, and the maximum number of iterations is 500 for all selected algorithms. All the algorithms are implemented using MATLAB software installed on a laptop with an i5 processor, a 4.44 GHz clock frequency, and 16 GB of memory. The algorithms are executed 30 times individually for a fair comparison.

Results for numerical optimization functions

The details of the selected benchmark functions are recorded in Table 2. The functions F1-F4 have unimodal features with 30 dimensions, F5-F7 have multimodal features with 30 problem dimensions, & F8-F10 have multimodal features with very low dimensions. The purpose of selecting the listed benchmark function is to analyze the exploration and exploitation behaviour of the developed KCGWO algorithm. The statistical measures, such as minimum (Min), Mean, maximum (Max), and Standard Deviation (STD) of all designated algorithms, are recorded in Table 3.

Table 2 10 benchmark test functions for validation.
Table 3 Results obtained for 10 benchmark functions by all selected algorithms.

Classifying functions F01-F04 as unimodal test scenarios with a single global best is appropriate. Such test sets can be used to look into the general exploitation potential of the proposed KCGWO approach. Findings of the proposed KCGWO and other approaches, as well as their Min, Max, Mean, and STD, are shown in Table 2. The associated tables' higher outcomes are noted. The optimization techniques are then ordered based on their average values. The average rank is also calculated to determine the approaches' overall ranking. All Tables include a summary of the findings of the statistical analysis. The best results are emphasized in bold face in all tables. For each unimodal function, individual ranking is provided to examine the performance of the proposed algorithm. The proposed algorithm stands first out of all selected algorithms for all four unimodal functions.

The F1–F04 shows that the KCGWO can arrive at capable making with a suitable exploitation capability. This seems to be due to the effectiveness with which the suggested K-means clustering concept and weight factors can boost the GWO's tendencies for exploration and exploitation. As a result, the mechanisms make the algorithm more likely to produce smaller fitness and higher stability index values. This tool helps explore new locations close to the recently discovered results. Because of this, it was found that the new algorithmic changes have improved how GWO handles unimodal test cases. Assessing the exploration potential using multimodal functions (F5–F10) is reasonable. Table 2 shows that KCGWO can investigate highly competitive solutions for the F5–F10 test scenarios. The KCGWO can produce optimal results for all test functions compared to other approaches. According to the results, KCGWO can outperform all selected algorithms in multimodal instances. Additionally, statistical analyses show that, in 95% of evaluations, KCGWO outcomes are superior to those of other approaches. Compared to GWO, the accuracy is increased based on the STD index.

In particular, when the objective problems (F5-F8) involve several local optima, KCGWO's outperformance demonstrates a sufficient explorative nature. This is due to the effectiveness with which the K-means clustering structure can boost the GWO's performance for exploration and exploitation. Lower stability index values can encourage wolves to make more exploratory jumps. This feature might be seen when KCGWO requires investigating previously unexplored regions of the issue landscape. The weight factors have helped GWO achieve a delicate balance between its local and global search inclinations. According to the findings, the recommended K-means searching steps increase the GWO's exploration capability. Additionally, the KCGWO's update mechanism can lessen the likelihood of the KCGWO entering local optima. The exploratory propensity of KCGWO is hence advantageous. The computational complexity of the proposed algorithm is assessed by recording the RunTime (RT). The RT values for each function by all selected algorithms are recorded in Table 4. The average values of RT are also provided, and based on the mean RT value, the original GWO has less RT value, and the RT value of KCGWO is slightly greater than the GWO, which is due to the fact that the introduction of the K-means clustering mechanism. At the same time, the weight factor does not impact the proposed algorithm's computational complexity.

Table 4 RT values of each test functions.

Figure 3 shows all selected algorithms' convergence characteristics for handling F1-F10 functions. All selected algorithms consistently outperform for a benchmark and have excellent convergence outfits in the original publication. Figure 3 also offers a convergence timeframe. It pinpoints the times when KCGWO performs better than GWO. According to Fig. 3, KCGWO eventually converges to superior outcomes. A large number of iterations allows KCGWO to approximate more precise solutions close to the optimum solutions. Additionally, rapid convergence patterns may be seen when comparing the curves of KCGWO and its original version. This pattern demonstrates that KCGWO can emphasize more exploitation and local search in the latter stages. These plots suggest that the KCGWO can successfully increase all wolves' fitness and promise to exploit improved results. In order to visualize the stability analysis, the boxplots are also plotted and shown in Fig. 4. From Fig. 4, it is detected that the stability of the KCGWO is better than all selected algorithms.

Figure 3
figure 3

Convergence curve obtained by all algorithms.

Figure 4
figure 4

Boxplot analysis of all selected algorithms.

To further asses the performance of the proposed algorithm, the statistical non-parametric test, Friedman's Ranking Test (FRT), has been conducted, and the average FRT values of all algorithms are logged in Table 5. Based on the observation, the proposed algorithm attains the top of the table with an average FRT of 1.383, followed by GWO, AOS, MVO, SSA, MFO, JAYA, and PSO.

Table 5 FRT values of all algorithms for each test functions.

These statistics indicate that the K-mean clustering approach and modified position update equation based on the weight factors can enhance the search functionality of GWO. Comparing the suggested KCGWO to existing approaches with superior convergence characteristics can be more effective.

Results for data clustering problems

The suggested clustering approach was thoroughly assessed using eight datasets. Few datasets used are synthetic, and others are drawn from real-time benchmark data. Table 6 provides a summary of the traits of these datasets106. The features (dimensions), the total number of samples, and the number of clusters in each dataset are recorded in Table 6. The type of the problems is also mentioned. The dataset is selected based on the type and number of samples.

Table 6 Details of the selected dataset106.

The performance of the proposed KCGWO for clustering is initially compared with standalone K-means clustering algorithms and the Gaussian Mixture Model (GMM). The non-linear, unsupervised t-distributed Stochastic Neighbor Embedding (t-SNE) is typically employed for data analysis and high-dimensional data visualization. For all the selected datasets, t-SNE plots obtained by KCGWO, GMM, and K-means are plotted in Figs. 5, 6, 7, 8, 9, 10, 11 and 12. Figure 5a displays the emission data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figure 5b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster centre of each cluster found by the K-means algorithm is also demonstrated in Fig. 5c.

Figure 5
figure 5

T-SNE plots of emission data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 6
figure 6

T-SNE plots of HTRU2 data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 7
figure 7

T-SNE plots of Wine data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 8
figure 8

T-SNE plots of Breast cancer data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 9
figure 9

T-SNE plots of Sonar data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 10
figure 10

T-SNE plots of WDBC data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 11
figure 11

T-SNE plots of iris data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 12
figure 12

T-SNE plots of 2022 Ukraine-Russia war data; (a) KCGWO, (b) GMM, (c) K-means.

Figure 6a displays the HTRU2 data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figure 6b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 6c. Figure 7a displays the Wine data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figures 7b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 7c.

Figure 8a displays the Breast cancer data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figure 8b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 8c. Figure 9a displays the Sonar data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figures 9b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 9c.

Figure 10a displays the WDBC data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figure 10b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 10c. Figure 11a displays the Iris data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. Figure 11b and c show the t-SNE plots obtained by the GMM and K-means algorithms. The cluster center of each cluster found by the K-means algorithm is also demonstrated in Fig. 11c.

Figure 12a displays the 2022 Ukraine-Russia war data distribution obtained by the KCGWO between various dimensions. It also shows how well the high-dimensional data are distributed in 2-dimensions. According to Figs. 5, 6, 7, 8, 9, 10, 11 and 12, in data with convex-shaped clusters, KCGWO has been capable of recognizing clusters and discriminating overlap among clusters quite effectively. This demonstrates how clearly defined the differences between the clusters are. According to Figs. 5, 6, 7, 8, 9, 10, 11 and 12, KCGWO could cluster most of the data points accurately despite the high density and large scatter of sample points in the dataset. This shows that KCGWO is resistant to high data volume and dispersion. Additionally, it has been demonstrated that KCGWO performs effectively when dealing with circular clusters in difficult datasets. KCGWO successfully identifies the majority of the curved regions in these datasets. Due to the utilization of the Euclidean distance measure for clustering, the proposed KCGWO has not completely distinguished all of the clusters in the data.

In order to prove the performance of the proposed KCGWO, two additional metrics, such as Mean Absolute Error (MAE) and Mean Squared Error (MSE), are recorded in Table 7. The average MAE and MSE values obtained by KCGWO with respect to GMM and K-means are listed in Table 7. Based on the average values, it is observed that the performance of the KCGWO with respect to K-means is better than GMM. Based on the comparison of the GMM with respect to K-means, it is observed that GMM is performing better than the K-means clustering algorithm. The results show that KCGWO produced results with more accuracy than GMM and K-means. One way to look at this enhancement is due to the population distribution by K-means and the weight factors, which avoids early convergence and strikes a compromise between global and local searches. Conversely, the proposed KCGWO has significantly enhanced effectiveness in data with significant overlapping and difficulty. As a result, KCGWO outperformed GMM and K-means and improved the ability to identify non-linear clusters.

Table 7 MAE and MSE values obtained by KCGWO, GMM, and K-means.

Further, to have a fair comparison, the performance of the proposed algorithm is also compared with other metaheuristic algorithms, such as GWO, MFO, MVO, and SSA, in terms of the statistical measures, such as Min, Mean, Max, and STD. For all algorithms, the population size is carefully chosen as the number of clusters multiplied by 2, and the iteration count is 500. Table 8 recorded all the statistical measures of all selected algorithms and datasets. It is noticed from Table 8 that the KCGWO can able to attain the best Min values for all datasets. The proposed algorithm can converge to the global optima and find the best solution. Except for the WDBC dataset, the proposed algorithm's maximum values are better. The mean and STD values obtained by the proposed algorithm are better than any other algorithms for all selected datasets. It means that the reliability of KCGWO is better than any other selected algorithms for all selected datasets. For each dataset, the ranking is provided based on the Min values, and the average rank values are also logged in Table 8. Based on the mean rank values, KCGWO stands first, followed by SSA, GWO, MFO, and MVO.

Table 8 Statistical results obtained for clustering problem by all selected algorithms.

The following ratios of sequential errors describe the convergence rate given an undetermined optimal value, which is typically the case in data clustering applications:

$$\mathrm{Convergence \,\,Rate} \left(CR\right)=\frac{\left|{f}_{i+1}-{f}_{i}\right|}{\left|{f}_{i}-{f}_{i-1}\right|} ,$$
(18)

where \({f}_{i}\) denotes the fitness value during the current iteration, \({f}_{i+1}\) denotes the fitness value during the next iteration, and \({f}_{i-1}\) denotes the fitness value during the previous iteration. The logarithmic CR plot measures the dynamic fitness change all over the iteration. The curves of logarithmic convergence curves are illustrated in Fig. 13 to visualize the effect on the various datasets. Comparatively to the other configurations, such as GWO, MFO, MVO, and SSA, using K-means clusters with weight factors in GWO has produced a good convergence that avoids the local optimum trap, with the lowest MAE and MSE values occurring at iteration 500. The adopted mechanism in the GWO algorithm maintained a reasonable balance between them and produced suitable population patterns for exploration and exploitation. In addition to the convergence curve, the boxplot analysis is also made to prove the reliability of the algorithms selected. All the algorithms are executed 30 times. The boxplots are plotted and illustrated based on the recorded values in Fig. 14.

Figure 13
figure 13figure 13

Convergence curves obtained by all algorithms: (a) Emission, (b) HTRU2, (c) Wine, (d) Breast cancer, (e) Sonar, (f) WDBC, (g) Iris, (h) 2022 Ukraine-Russia war.

Figure 14
figure 14figure 14

Boxplots obtained by all algorithms; (a) Emission, (b) HTRU2, (c) Wine, (d) Breast cancer, (e) Sonar, (f) WDBC, (g) Iris, (h) 2022 Ukraine-Russia war.

From Fig. 14, it is clearly evident that the reliability of the KCGWO is superior to all the selected algorithms. The computational time necessary by the algorithm to find the overall optimal solution is known as the time to best solution. The RT of an algorithm is the sum of all computations performed until its stopping criterion stops it. Therefore, the RT is recorded for the selected algorithms and recorded in Table 9.

Table 9 RT values of all algorithms for the clustering problem.

Similar to numerical optimization problems, the average RT values are provided in Table 9, and based on the mean RT value, the original GWO has less RT value, and the RT value of KCGWO is slightly greater than the GWO, which is due to the fact that the introduction of the K-means clustering mechanism. At the same time, the weight factor does not impact the proposed algorithm's computational complexity. It is clear from the prior comparisons and discussions that the improvisation of GWO performance with K-Means clustering and weight factors has accomplished its objectives and improved the original GWO algorithm. The new adjustments enabled KCGWO to defeat numerous original and other selected algorithms, presenting KCGWO as a global optimizer and an efficient data clustering technique that can be applied in industrial applications.

Discussions

While the KCGWO introduces significant improvements to the conventional GWO, enhancing its applicability to data clustering tasks, it is not without its limitations. These constraints, inherent to the methodology and application context, warrant consideration for future research and practical implementation. KCGWO's performance is partly contingent on the initial clustering obtained from the K-means algorithm. This dependence means that the quality of KCGWO's outcomes can be affected by the initial positioning of centroids in K-means, which is sensitive to the chosen initial points. If the K-means algorithm converges to a local optimum during its initialization phase, KCGWO may start from a less advantageous position, potentially impacting the overall optimization process. The introduction of a dynamic weight factor in KCGWO, while beneficial for balancing exploration and exploitation, adds complexity in terms of parameter tuning. The performance of KCGWO can be sensitive to the settings of this weight factor alongside other algorithm parameters. Finding the optimal configuration requires extensive experimentation and can be computationally demanding, especially for large-scale problems or datasets with high dimensionality. Although KCGWO is designed to explore and exploit the solution space efficiently, the computational overhead introduced by the integration of K-means and the dynamic weight adjustment mechanism can increase the algorithm's computational complexity. This may limit the scalability of KCGWO to very large datasets or real-time clustering applications where computational resources or time are constrained. While empirical tests have demonstrated KCGWO's effectiveness on various datasets, its ability to generalize across all types of data distributions remains a concern. The algorithm's performance on datasets with complex structures, high dimensionality, or noise could vary, and its robustness in these scenarios has not been fully explored. The K-means component of KCGWO may not be inherently robust against noise and outliers, as K-means tends to be influenced by these factors. Consequently, KCGWO's performance could be degraded in datasets where noise and outliers are prevalent, affecting the quality of the clustering outcomes.

Addressing these limitations presents paths for future work, including the development of strategies to reduce dependence on initial clustering quality, adaptive parameter tuning mechanisms to mitigate sensitivity issues, and enhancements to computational efficiency. Additionally, further research could explore the incorporation of noise and outlier handling techniques to improve the robustness of KCGWO across diverse and challenging data environments.

Conclusions

This study advances data clustering and optimization through the development of an innovative approach, integrating the GWO with K-Means clustering, further augmented by a dynamic weight factor mechanism. This integration not only contributes to the theoretical framework of swarm intelligence methods but also demonstrates practical applicability in enhancing data clustering outcomes. The theoretical implications of this research are underscored by the systematic incorporation of a traditional clustering algorithm with a contemporary optimization technique, enriching the metaheuristic algorithm landscape. This methodology offers a new perspective on achieving a balance between exploration and exploitation in swarm-based algorithms, a pivotal factor in their efficiency and effectiveness for complex problem-solving. From a practical perspective, the introduction of the KCGWO represents a significant advancement towards more accurate and efficient data clustering solutions. By ingeniously adjusting swarm movements based on initial positions and integrating weight factors, the method exhibits enhanced diversity and an improved ability to escape local optima. These features are essential for applications demanding precise data segmentation, such as image recognition, market segmentation, and biological data analysis.

The contributions of this research extend beyond theoretical enhancement, offering tangible benefits to sectors reliant on data analytics. The improved exploration and exploitation dynamics of KCGWO result in faster convergence rates and superior clustering outcomes, rendering it an invaluable asset for processing large datasets with intricate structures. This is particularly pertinent in the Big Data context, where rapid and accurate clustering of large data sets can significantly influence decision-making processes and resource management.

In summary, the KCGWO algorithm marks a notable academic contribution to the discourse on optimization algorithms and facilitates its application across various practical scenarios. Its adaptability and efficiency herald new possibilities for addressing data-clustering challenges in diverse fields, signalling a new era of optimization solutions that are robust and responsive to the dynamic requirements of data analysis.