q-rung orthopair fuzzy 2-tuple linguistic clustering algorithm and its applications to clustering analysis

q-ROPFLS, including numeric and linguistic data, has a wide range of applications in handling uncertain information. This article aims to investigate q-ROPFL correlation coefficient based on the proposed information energy and covariance formulas. Moreover, considering that different q-ROPFL elements may have varying criteria weights, the weighted correlation coefficient is further explored. Some desirable characteristics of the presented correlation coefficients are also discussed and proven. In addition, some theoretical development is provided, including the concept of composition matrix, correlation matrix, and equivalent correlation matrix via the proposed correlation coefficients. Then, a clustering algorithm is expanded where data is expressed in q-ROPFL form with unknown weight information and is explained through an illustrative example. Besides, detailed parameter analysis and comparative study are performed with the existing approaches to reveal the effectiveness of the framed algorithm.

VIKOR method 14 , and TODIM method 15 , are expanded to Pythagorean 2TL information based on this concept. However, the assignment of membership grades and non-membership grades in P2TLNs is subject to certain constraints. P2TLNs have a requirement that µ 2 + ν 2 ≤ 1 , yet there are several instances in which the assessment information given by DMs in the form of P2TLNs cannot satisfy the requirement. For instance, if the membership and non-membership grades are given as 0.7, 0.8 , P2TLNs are unable to successfully process it because 0.7 2 + 0.8 2 > 1 . Wei et al. 16 brought forward the q-ROPFLSs based on the q-rung orthopair fuzzy sets 17 , in which the total of the qth power of membership grade and non-membership grade should be less than 1, i.e., µ q + ν q ≤ 1 . And when q = 2 , the q-ROPFLN can be reduced to P2TLN. The q-ROPFLN is hence a more generic and adaptable type of information representation. It has an exceptionally vast expression domain and can prevent data loss. Wei et al. 16 presented various q-ROPFL Heronian mean operators as well as their weighted versions. Ju et al. 18 investigated some Muirhead mean operators with q-ROPFLNs for MCGDM challenges. Recently, Li et al. 19 devised q-ROPFL PROMETHEE II model for MCGDM with unknown weight information.
In recent decades, the correlation coefficient, which is a key for studying the link between any two parameters or variables, has garnered considerable attention. The correlation coefficient developed by Karl Pearsons 20 has been utilized in several statistical research, including data analysis and classification, pattern identification, clustering, medical diagnosis, and decision-making. It has been determined that conventional correlation is unsuitable for handling data pertaining to situations of a fuzzy character. To address such issues, several writers have expanded the concept of statistical correlation to include fuzzy correlation [21][22][23] . In 24 , Gerstenkon and Manko developed the notion of the intuitionistic fuzzy correlation coefficient. Hong and Hwang 25 analyzed the correlation measure and correlation coefficient for IFSs in probabilistic spaces. Zeng and Li 26 presented the correlation coefficient of IFSs, which is analogous to the cosine of the intersectional angle in finite sets and probability spaces. Another study 27 demonstrated the applicability of IFS correlation coefficients to pattern recognition issues. Chen et al. 28 developed correlation coefficients for hesitant fuzzy sets and used these concepts to clustering analysis. The authors in 29 explored multilevel analysis methodologies and applications. Garg 30 developed a novel correlation coefficient for Pythagorean fuzzy sets and used them for decision making. Park et al. 31 propounded the correlation coefficient of interval-valued IFSs and highlighted their applicability by applying them to the challenges of MCGDM. Nguyen 32 devised the similarity or dissimilarity measure for IFSs with its applications in pattern recognition, whereas Du 33 produced the correlation and correlation coefficients of q-ROFSs. Recently, Li and his coworkers 34 studied two ζ-correlation coefficients for q-rung orthopair fuzzy setting and addressed an example of clustering analysis to justify the superiority of the suggested approach.
The advancement of the theory and the practical uses of correlation coefficients motivated us to investigate these concepts. Following that, this article explores the correlation coefficients and clustering technique for q-ROPFLSs. Unlike the aforesaid fuzzy sets, q-ROPFLS is constituted by a linguistic 2-tuple and a q-ROFS. Concerning the linguistic 2-tuple, it is a model that prevents the loss of information during computations of discrete linguistic values. The intuitionistic 2TL sets and the Pythagorean 2TL sets are likewise composed of a linguistic 2-tuple, but they include limits on the selection of membership and non-membership grades, whereas q-ROPFLS do not. In the context of intuitionistic 2TL sets, we cannot award 0.5 and 0.6 as membership and non-membership grades since their total exceeds 1. Similarly, in the context of Pythagorean 2TL sets, we cannot select 0.7 and 0.8 as membership and non-membership grades due to the limitation that the sum of their squares exceeds 1. In q-ROFS, however, the range of numbers that can be allocated as membership and non-membership grades is so broad, i.e., we can assign membership and non-membership grades any value between 0 and 1. Thus, the structure of q-ROPFLS is superior compared to other existing frameworks.
The factors listed inspired us to perform this study: (I) The q-ROPFLS is a useful tool for conveying MCGM problem assessment information. The type of information itself reveals that it mainly contains the following key advantages: (i) information distortion during linguistic information processing can be reduced; (ii) information loss through incorporating parameter q to convey evaluation results can be effectively avoided, and a significantly large range can be used to represent membership grades and non-membership grades for a linguistic evaluation; and (iii) the problems associated with two-dimensional information can be effectively addressed in real-world applications. Currently, there are several techniques in the literature that work in information energy and develop a formula for calculating the correlation coefficient; but, there is a need to expand the methodology for measuring the correlation coefficient in the context of q-ROPFLS. (II) The significance of criteria in decision analysis is of the utmost relevance for making rational choices.
Typically, these weights are unavailable in advance. However, the majority of available clustering methods only account for the situation of known weights and disregard the case of unknown weights. To get more precise findings, it is required to develop a clustering model based on unknown criteria weights.
According to the aforementioned motivations, the following are the novel aspects of this research study: (I) Information energy, covariance, correlation coefficient, and their corresponding weighted forms for q-ROPFTLSs are formulated. Also, the required properties of the proffered formulation are verified. (II) Based on the developed theory, the conventional clustering algorithm is extended for q-ROPFLSs with unknown criteria weight information. (III) A case concerning the classification of CIM software is provided to demonstrate the application of the framed algorithm. Then, a case concerning the clustering of construction materials is studied to compare the proposed method with the prevailing methods.
Definition 1 8,35 Let ϑ be the result of an aggregation of the indices of a set of labels evaluated in a linguistic term set S, i.e., the outcome of a symbolic aggregation operation, ϑ ∈ [1, ℓ] , with ℓ being the cardinality of S. If S = s 1 = extremely poor, s 2 = very poor, s 3 = poor, s 4 = medium, s 5 = good, s 6 = very good, s 7 = extremely good .  www.nature.com/scientificreports/ r = round(ϑ) and κ = ϑ − r are two numbers such that r ∈ [1, ℓ] and κ ∈ [−0.5, 0.5) , then κ is called a symbolic translation.
Definition 2 8,35 Let S = {s θ |θ = 1, 2, ..., ℓ} be a linguistic term set and ϑ ∈ [1, ℓ] be a numerical value indicating the linguistic symbolic aggregation outcome. Then, the function used to retrieve the 2-tuple linguistic information equivalent to ϑ is then defined as where round (.) is the conventional round function, s r is the index label closest to ϑ , and κ is the symbolic translation value.

Definition 9
Let Z be a fixed set, S = {s θ |θ = 1, 2, ..., ℓ} be a linguistic term set, for q-ROPFLSs ,ν(t i )� on Z , the covariance of F and F is presented by the following formula:

Theorem 1
The covariance of two q-ROPFLSs F and F holds the following properties: Proof 1. From Eq. (10), the covariance of F with F is given as 2. By using Eq. (10), the covariance of F and F is given as Definition 10 Let Z be a fixed set, S = {s θ |θ = 1, 2, ..., ℓ} be a linguistic term set, for q-ROPFLSs the correlation coefficient of F and F is presented by the following formula: The correlation coefficient between q-ROPFLSs F and F holds the following properties: Proof 1. It is obvious, so we omit the proof. 2.
Weighted covariance and correlation coefficient of q-ROPFLSs. The weighted correlation measure links the use of subject-assigned weights to the calculation of a correlation measure between two variables. The weights might either be readily available or selected by the researcher to meet a specific demand. For instance, if the number of estimates for each topic varies, it is customary to use these numbers as weights and calculate the correlation between the two variables. It has been shown that sporadic disagreements about distinct items may be associated with different weights. Consequently, while calculating the correlation coefficient between q-ROPFLSs, we shall consider the weighted impact. Within the framework of q-ROPFLS, we build a weighted correlation coefficient in the present part.
Suppose that the weight associated with each element t i is ̟ i , where ̟ i ∈ [0, 1](i = 1, 2, ..., n) and Then 1. The weighted information energy of F is given by: 2. The weighted covariance of F and F is presented by: 3. The weighted correlation coefficient of F and F is given by:

Theorem 4
The weighted correlation coefficient between q-ROPFLSs F and F contains the following properties: Proof 1. It is obvious, so we omit the proof. 2. www.nature.com/scientificreports/ 3. The inequality 0 ≤ ρ ̟ F ,F is obvious. Therefore, we need only to prove ρ ̟ F ,F ≤ 1.

Clustering algorithm under q-ROPFL environment
Based on the q-rung orthopair fuzzy clustering method 36 and the previously designed correlation coefficient formulas for q-ROPFLSs, we construct an approach for clustering in q-ROPFL environment. Firstly, the following ideas are introduced: Theoretical development. Definition 11 Let F j j = 1, 2, ..., m be m q-ROPFLSs, then C = ρ ij m×m is called a correlation matrix, where ρ ij = ρ F i , F j denotes the correlation coefficient of F i and F j , which meets the following characteristics: Theorem 5 Let C = ρ ij m×m be a correlation matrix, then the composition matrix C 2 is also a correlation matrix.
Thus, we complete the proof of Theorem 5. Theorem 6 Let C = ρ ij m×m be a correlation matrix, then for any non-negative integer k, the composition matrix is also a correlation matrix.
Proof We prove this by using mathematical induction method. If k = 0 , then C 2 = C • C, thus, by Theorem 5, C 2 is a correlation matrix.
then C is called an equivalent correlation matrix.
Lemma 1 37 Let C = ρ ij m×m be a correlation matrix, then after the finite times of composition: There exists a positive integer k such that C 2 k = C 2 k+1 , and C 2 k is also an equivalent correlation matrix.

Theorem 7 An equivalent correlation matrix can be derived after the finite times of composition from the correla-
Proof If C = ρ ij m×m is equivalent correlation matrix. Then the theorem is true obviously. If not, then by Lemma 1, there must exist some positive integer k such that C 2 k = C 2 k+1 , and C 2 k is also an equivalent correlation matrix. Since, C 2 k = C 2 k+1 , then C 2 k 2 ⊆ C 2 k+1 , i.e., C 2 k is an equivalent correlation matrix.
Definition 14 Let C = ρ ij m×m be a correlation matrix, then C ζ = ζρ ij m×m is called ζ −cutting matrix of C, where and ζ is the confidence level with ζ ∈ [0, 1].
.., C n } be the criteria set for each alternative and ̟ = {̟ 1 , ̟ 2 , ..., ̟ n } be the weight vector of criteria set C . The weight vector ̟ is utilized to depict the importance of different criteria in the process of decision making, where n j=1 ̟ j = 1 and ̟ j ∈ [0, 1] . The invited 'p' The main steps involved in the proposed model are manifested as below: Step 1: Collect the q-ROPFL experimental data set provided by each D k (k = 1, 2, ..., p) using the LTS S, which includes information about the alternatives described by their relevant characteristics/ criteria. The assessment information of D k can be described in the form of a decision matrix as where ∂ k ij is the ijth q-ROPFLN provided by kth DM to which alternative O i satisfies the criteria C j .
Step 4: According to Eq. (14), compute the weighted correlation coefficient between F i and F j , and then build the correlation matrix C = ρ ij m×m where ρ ij = ρ F i , F j i, j = 1, 2, ..., m .
Step 5: Check whether C = ρ ij m×m is an equivalent correlation matrix, i.e., If it does not hold, we construct the equivalent correlation matrix C 2 k : Step 6: To categorise the q-ROPFLSs F i (i = 1, 2, ..., m) , we create a ζ-cutting matrix C ζ = ζρ ij m×m according to Definition 14 with a confidence level ζ . If all components of the ith line (column) in C ζ match those of the jth line (column) in C ζ , then the q-ROPFLSs F i and F j are of the same type. This practice enables us to categorise these m q-ROPLSs F i (i = 1, 2, ..., m). Figure 2 provides a flowchart depiction of the method for easier comprehension.

Applications and analysis
This section provides some examples to explain the applicability and validity of the proposed clustering algorithm.
Illustrated example. Software assessment and categorization is becoming increasingly essential issue in all areas of human activity. Industrial production, service delivery, and corporate administration are all strongly reliant on software, which is becoming more sophisticated and costly 38 . A CASE tool to help software develop- www.nature.com/scientificreports/ ment in a CIM context must be chosen among those available on the market. CIM software is often in charge of production planning, control, and monitoring 39 . We do clustering for various kinds of CIM software O i (i = 1, 2, ..., 7) on the market to better assess them based on four criteria: C 1 : functionality, C 2 : usability, C 3 : portability, and C 4 : maturity. Given that the specialists who do such an examination have varying backgrounds and degrees of knowledge, abilities, experience, personality, and so on, the evaluation information may differ. The experts are provided with the LTS The assessment information is represented by the q-ROPLSs and enlisted in Tables 1, 2 and 3 to clearly indicate the disparities in the judgments of experts. Now we proceed to follow the steps of the established algorithm as: Step 1: The q-ROPFL decision matrices M k 7×4 (k = 1, 2, 3) are shown in Tables 1-3.
Step 3: Based on maximizing deviation model 19 , criteria weights are determined as:   Step 5: Find out the equivalent correlation matrix:  Table 1. q-ROPFL data provided by D 1 .  Table 2. q-ROPFL data provided by D 2 .  Table 3. q-ROPFL data provided by D 3 . How to choose the optimal value of ζ ? The framed method categorizes the q-ROPTLSs under the provided confidence levels by utilizing the ζ-cutting matrix of the corresponding correlation matrix. Given that confidence      Parameter analysis. This section discusses option clustering when the parameter q values fluctuate. In the deployed algorithm, several values of q are used for this, as demonstrated below. Case 1: Clustering analysis using q = 3 : In the preceding section, the clustering method for q=3 has already been executed. Thus, we proceed to the subsequent cases.
Using the obtained weight vector ̟ = (0.2353, 0.2727, 0.1912, 0.3008) ( for q=3), and following the Steps 4-6 of clustering algorithm outlined in the preceding section, the case-by-case computations are performed as follows.
What's more, in Cases 1 and 2 (two categorization scenario), we find that O 2 belongs to a separate class than the other softwares. While Cases 3-6 classify O 5 as a distinct category from the others. Next, in the six categorization scenario, the results of Cases 4 and 6 do not match those of Cases 1, 2, 3, and 5. There are significant variances across the remaining situations. Therefore, the framed method is extremely sensitive to 'q' from a classification point.
How to choose the optimal value of q ? To obtain a reasonable value for the parameter q, it must be determined based on the evaluation values provided by the DMs, and it may pick the lowest integer q meeting the inequality µ q + ν q ≤ 1 . For instance, if the evaluating value provided by the DM is 0.8, 0.7 , we may set the parameter q to  www.nature.com/scientificreports/ 3 since 0.8 2 + 0.7 2 > 1 and 0.8 3 + 0.7 3 < 1 , where q = 3 is the smallest integer. If the DM wants to make a judgment based on complicated data, just increase q to enlarge the information representation space of q-ROPFLSs.
Criteria weight analysis. The present part performs sensitivity analysis by varying several criteria weights to ensure the robustness of the created approach. To accomplish this, we switch the weights of any two criteria while maintaining the weights of the other criteria constant. In the case of four criteria, there are six possible cases: Case 1: Interchanging C 1 and C 4 : Utilizing Eq. (14) the resulting correlation matrix is obtained as: Step 5: Find out the equivalent correlation matrix: Step 6: In the light of Eq. (19) to generate a ζ-cutting matrix C ζ = ζρ ij 7×7 from which all plausible classes of the softwares O i (i = 1, 2, ..., 7) can be derived:   Step 4: Utilizing Eq. (14) the resulting correlation matrix is obtained as: Step 5: Find out the equivalent correlation matrix: Step 6: In the light of Eq. (19) to generate a ζ-cutting matrix C ζ = ζρ ij 7×7 from which all plausible classes of the softwares O i (i = 1, 2, ..., 7) can be derived: Similarly, the proposed approach can be used to solve the other cases. In Cases 3, 4, 5, and 6, we will find that the software is of the same type for 0 < ζ ≤ 0.6121 , 0 < ζ ≤ 0.6145 , 0 < ζ ≤ 0.6301 , and 0 < ζ ≤ 0.6031 , respectively. It is evident from this that the upper bounds of the intervals are quite close to each other, indicating that the introduced algorithm is stable with respect to criteria weight fluctuations.
Comparative Illustration. This section compares the devised clustering algorithm with previous methods, including intuitionistic fuzzy and q-rung orthopair fuzzy clustering algorithms 36,40 .    www.nature.com/scientificreports/ materials has four criteria C 1 , C 2 , C 3 , and C 4 with the weight vector ̟ = (0.2, 0.2, 0.3, 0.3) . DMs give their views based on the LTS, S (as taken previously). The data collected for the four materials by three professionals according to each criterion is recorded in Tables 5, 6 and 7. Here, we cluster the customers based on the appropriate degree of confidence. Note that in a clustering method, a broad range of confidence level results in the formation of viable clusters. We first employ the proposed algorithm for various values of parameter ζ as follows: Step 1: The q-ROPFL decision matrices M k 4×4 (k = 1, 2, 3) are depicted in Tables 5-7. Step 2: Employ the q-ROPFLA operator Eq. (6) (taking q = 1 ) to aggregate all the matrices into collective decision matrix (see Table 8).
Step 3: Since the weight vector is given by DMs, so we omit this step.
Step 4: The weighted correlation matrix of the q-ROPFLSs O i (i = 1, 2, ..., 4) with respect to their given weight is computed as follows: Step 5: Determine the equivalent correlation matrix: Step 6: In the light of Eq. (19) to generate a ζ-cutting matrix C ζ = ζρ ij 4×4 from which all plausible classes of the softwares O i (i = 1, 2, ..., 4) can be derived: Example 2 data can not be modeled by applying intuitionistic and q-rung orthopair fuzzy correlations 36,40 . To make the Table 8 legitimate for the implication of 40 and 36 , we exclude the 2-tuple linguistic data from Table 8. By doing so, the resultant data are reduced to a q-rung orthopair fuzzy context (q=1) and are displayed in Table 9.
The stepwise computations are performed by adhering to the method outlined in Ref. 36 .
Step 1: The weighted correlation matrix of the q-ROFSs O i (i = 1, 2, ..., 4) with respect to their given weight is carried out as follows: Step 2: Formulate the equivalent correlation matrix:    Table 6. q-ROPFL data provided by D 2 .  Table 7. q-ROPFL data provided by D 3 .  Table 8. Aggregated q-rung orthopair fuzzy 2-tuple linguistic decision matrix.  Table 5. q-ROPFL data provided by D 1 .  Next, the stepwise computations are carried out by following the procedures of Ref. 40 (fixing α = 0.5).
Step 1: The correlation matrix of the IFSs O i (i = 1, 2, ..., 4) is determined as follows: Step 2: Frame the equivalent correlation matrix: Step 3: Generate a ζ-cutting matrix C ζ = ζρ ij 7×7 from which all plausible classes of the softwares O i (i = 1, 2, ..., 4) can be derived: By analyzing the results generated by the Bashir et al. approach 36 , we can find out that when the number of classes is two, the alternatives O 2 and O 3 are clustered into a single class, but accordingly to the results derived by the suggested and 40 method, O 2 is clustered into the class of O 1 and O 4 . Further , we can see that our developed methodology has started classification when ζ = 0.9215 . However, the method of 36 and 40 do not start classification until ζ = 0.9648 and 0.9975, respectively. Our clustering results have a quicker convergence rate than 36,40 , therefore they can more accurately depict the distinction between groups. Notice that the present techniques 36,40 can only process numeric data. Their failure to cope with linguistic arguments has resulted in significant information loss. Whereas the framed algorithm is capable for a linguistic preference structure with symbolic translation parameters of linguistic arguments of solving the MCGDM problems with completely unknown weight information.  Table 9. Aggregated q-rung orthopair fuzzy decision matrix.   40 has used a two-parametric correlation coefficient in his work, so expanding the range of confidence level, and by modifying the values of these two parameters, we can produce the equivalent matrix with less iterations. Despite this advantage, their model fails in the most of complicated problems due to the high constraints on its characteristic functions. For example, it cannot manage the information for the value 0.5, 0.6 , but the created approach provides an adaptable parameter that enables the easy development of this kind of data. In addition, the approach of Basir et al. 36 is based on known weights, and the method 40 is based on correlation coefficient rather than weighted correlation coefficient; thus pay no attention to criteria weights. This ignorance may result in some incorrect clusters.
The main benefits of the described clustering algorithm over the past ones are talked over: (i) The proposed method is suitable for a linguistic preference structure with symbolic translation parameters of linguistic arguments that are highly effective in dealing with ambiguity in the MCGDM problem. While using q-ROPFLSs, the DMs remain easier for data collection and avoid any information loss. (ii) Unlike the prevailing techniques 36,40 , the framed algorithm aims to ascertain the weights of the evaluation criteria. (iii) The presented method is capable for solving the group decision making problems in the context of q-ROPTLSs. Whereas the existing methods 36,40 work only for individual decision matrix and fail to model the multi-experts problem. Also the rate of convergence of the developed clustering model is faster than 36,40 , can be analyzed from the above comparison.
The presented structure also has disadvantages, which are outlined below: (i) In practice, DMs have different knowledge, proficiency and experiences and therefore the importance of each DM may not be equal. In present study this difference of knowledge and experiences (relative weight) of each DM is not considered and equal weight has been assigned to each DM. (ii) The formulated correlation coefficients are information energy-based measures, whose results lie inside the interval [0, 1], and hence cannot be used to indicate the negative correlation between two variables.

Concluding remarks
In this article, an intriguing study based on q-ROPFL clustering algorithm employed for classification problems in decision making was presented. For this, we explored the ideas of the information energy and covariance of q-ROPFLSs, and then presented a correlation coefficient. We additionally defined the weighted covariance and correlation coefficient of the q-ROPFLSs. Also, some desired properties and results of the proposed information energy and correlation coefficients were argued. Furthermore, some theoretical development, including the notion of composition matrix, correlation matrix, and equivalent correlation matrix via the proposed correlation coefficients, was given, and then proposed an algorithm for clustering q-ROPTLSs. The presented method works well with symbolic translation parameters of linguistic arguments in a linguistic preference structure. As opposed to conventional decision-making techniques, our suggested clustering algorithm uses q-ROPFLSs, which consistently prevent any loss of information. A practical problem concerning the clustering of construction materials was addressed, and a detailed sensitivity analysis was also performed. It was noticed the parameters q and ζ indeed have an impact on the clustering of alternatives. Finally, a comparison example was provided, and it was found that the results of the framed algorithm have more rapid convergence, which confirms the practicality and superiority of the developed approach. When employing the proposed clustering algorithm based on q-ROPFLSs, further work is still required. To broaden the application range of the present clustering algorithm, it needs to assign weights for different DMs 19 . Secondly, the introduced concepts can be explored for other extensions of fuzzy theory, which will develop many interesting structures, results, and applications. In addition, it is also a worthy research topic to expand the range of the devised correlation coefficients to the interval [−1, 1] 42 .