Introduction

Statistical process control (SPC) is a statistical methodology for monitoring and controlling the variation of a process to ensure that it produces products that meet customer requirements. A control chart, which is part of SPC, is one of the tools often used to monitor the company’s quality of products and services1. Based on the number of monitored quality characteristics, the control charts are divided into two types: univariate and multivariate control charts. The univariate control charts monitor only one quality characteristic, while the multivariate control charts are applied to monitor more than one quality characteristic.

In the current industrial era 4.0, it is hoped that a process can not only be monitored from one type of quality characteristic. For example, in monitoring the variable characteristics (in a numerical scale such as height or weight), a control variable chart is used. Meanwhile, attribute control charts are always employed to monitor categorical or attribute data (such as color or hardness)2. Monitoring a mixed quality characteristic in the manufacturing process is important3. However, the monitoring procedure for mixed quality characteristics was commonly conducted in individual ways in the past. The inefficiency will happen due to the need for calculating two statistics and control limits. Consequently, the administrator will have hardship in determining the monitoring result if the two procedures yield a different result. Therefore, a new concept of monitoring mixed characteristics is urgently needed.

Ahsan et al.4 proposed a new monitoring procedure based on the PCA Mix algorithm to overcome this issue. This work also extended to detecting outliers for various numbers of contaminated outliers5. The T2 statistics are used to form the control chart in this method. Meanwhile, due to the unknown distribution, the control limit of the PCA Mix chart is estimated using the kernel density, a non-parametric method to estimate the empirical density from the unknown distribution6. However, in this work, the performance of the PCA Mix chart is only evaluated for one categorical data or attribute characteristic in detecting outliers. Additionally, both variable and attribute qualities are tracked in the effectiveness of the PCA Mix chart in identifying a change in the process. There is no suggestion for what shift this chart performs best, as a result.

Based on those reasons, this work is proposed to evaluate in detail the performance of the PCA Mix chart for detecting outliers and shift in the process. Similar to the PCA Mix chart proposed by Ahsan et al.4, the proposed chart also employed the kernel density estimation (KDE) in calculating the control limit. The proposed chart is evaluated for more than one attribute characteristic detecting outliers. On the other hand, the proposed chart is evaluated for a different kind of shift and correlation when the process change is being monitored. In this work, it is also shown how the proposed chart is used to monitor actual data and how its performance is compared.

The remaining portions of this work are structured as follows: Sect. “Related works” reports the connected works of this research. The charting processes for the suggested method were provided in Sect. “PCA mix”. In Sections “Charting procedures” and “Performance in detecting outlier”, performance assessments for identifying outliers and process adjustments are presented. Furthermore, Sect. “Performance evaluation in monitoring process shift” illustrates how the suggested strategy is used to track the actual dataset. Section “Application in the real cases” provides a summary of the conclusion.

Related works

Recent advancements in the control chart are discussed in this section. This section covers three different categories of control charts: multivariate variable charts, attribute charts, and mixed charts. Three different multivariate control chart types such as Hotelling’s T2, Multivariate EWMA, and Multivariate CUSUM are the main emphasis of this development. The three different multivariate variable charts’ recent developments are summarized in Table 1. Table 2 lists the most current attribute chart works. The table demonstrates that current research has mostly concentrated on attribute charts using fuzzy, Poisson, and multinomial data. Recent advancements in the control chart are discussed in this section. In this section, the multivariate variable chart, attribute chart, and flow chart are the three primary forms of control charts that are covered.

Table 1 Multivariate variable chart’s most recent advancement.
Table 2 Attribute chart’s most recent advancement.

Additionally, Table 3 displays the mixed control chart’s most recent evolution. It is clear that a few works have looked at the mixed monitoring variable and attribute features in this field. Consequently, additional advancement in this field is required. In order to improve the monitoring process technique, this research aims to build and evaluate the performance of the mixed type chart, particularly the PCA mix control chart.

Table 3 Mixed chart’s most recent advancement.

PCA mix

A statistical method called multivariate data analysis can be used to examine data that includes two or more quality factors. These qualities may either be attribute- or attribute-variable (interval- or ratio-based) (category). A statistical technique known as principal component analysis (PCA) is used to reduce the dimensions of continuous data, also known as variable characteristics in statistical process control (SPC). An extension of correspondence analysis (CA), multiple correspondence analysis (MCA) examines the relationships between a number of correlated categorical variables, also known as attribute characteristics in SPC. When the observations are categorical, MCA may be thought of as an extension of the PCA approach35. Thus, PCA Mix method is a combination of PCA and MCA that can be used to handle different types of quality characteristics together.

In this study, the PCA Mix technique is implemented in accordance with the strategy suggested by Chavent et al.36. Let \(n \times p\) matrix \({\mathbf{X}}_{1}\) and \(n \times q\) matrix \({\mathbf{X}}_{2}\) consist of variable and attribute characteristics, respectively, where n is the number of observations, p is the number of variable characteristics, and q is the number of attribute characteristics. An indicator matrix \({\mathbf{G}}\) with dimensions \(n \times m\) provides binary coding for each attribute’s degree of features, where m is the sum of all attribute level features. An \(n \times (p + m)\) matrix \({\mathbf{Z}} = [{\mathbf{Z}}_{1} ,{\mathbf{Z}}_{2} ]\) includes a real number component, where \({\mathbf{Z}}_{1}\) and \({\mathbf{Z}}_{2}\) are centred matrices of \({\mathbf{X}}_{1}\) and \({\mathbf{G}}\). \({\tilde{\mathbf{Z}}}\) is calculated as

$$ {\mathbf{\tilde{Z} = N}}^{{\frac{{\mathbf{1}}}{{\mathbf{2}}}}} {\mathbf{ZM}}^{{\frac{{\mathbf{1}}}{{\mathbf{2}}}}} , $$
(1)

where \({\mathbf{\rm N}} = \frac{1}{n}{\mathbf{I}}_{n}\) is the rows’ weights of Z, \({\mathbf{M}} = diag\left( {1,...,1,\frac{n}{{n_{1} }},...,\frac{n}{{n_{m} }}} \right)\) is the weights of the columns of Z, the first p columns of Z are weighted by 1, and the last m columns are weighted by \(\frac{n}{{n_{s} }},\) for \(s = 1,2, \ldots ,m.\) The next step is solving the eigenvalue problem of \({\tilde{\mathbf{Z}}}\) using the Generalized Singular Value Decomposition (GSVD) in Chavent et al.36 as

$$ {\tilde{\mathbf{Z}}} = {\mathbf{U\Lambda V}}^{T} , $$
(2)

where \({{\varvec{\Lambda}}} = {\text{diag}}(\sqrt {\lambda_{1} } ,\sqrt {\lambda_{2} } , \ldots \sqrt {\lambda_{r} } ),\) where \(\lambda_{1} ,\lambda_{2} , \ldots ,\lambda_{r}\) are the eigenvalues of \({\tilde{\mathbf{Z}}},\) and r denotes the rank of \({\tilde{\mathbf{Z}}}.\) Matrix \({\mathbf{U}}\), which has \(n \times r\) dimensions, is an eigenvector of \({\tilde{\mathbf{Z}}}\), and \({\mathbf{V}}\) is the \((p + m) \times r\) matrix of the eigenvectors of \({\tilde{\mathbf{Z}}}.\) As a result, the principal component of PCA mix may be calculated as

$$ {\mathbf{Y}}^{mix} = \,{\mathbf{ZMV}}. $$
(3)

with the size of \(n \times r.\)

Charting procedures

The steps to create a multivariate control chart based on PCA Mix are covered in this section. The steps for building a multivariate control chart based on PCA Mix are shown in Fig. 1. There are three basic phases in the process. The PCs are initially calculated from the combined features using PCA Mix. The T2 statistics are computed in the second phase using certain main components. Finally, use KDE to estimate the suggested chart’s control limit.

Figure 1
figure 1

PCA mix control chart procedures.

PCA mix control chart’s procedures

Step 1 Input the variable data X1 and the attribute data X2

Step 2 Calculate the principal component scores (PCs) mix, denoted as \({\mathbf{Y}}^{mix}\), using the PCA Mix method from X1 and X2

Step 3 Take the first v components and calculate \(\tilde{T}_{i}^{2} = \sum\limits_{v = 1}^{l} {\frac{{(y_{i,v}^{mix} - \tilde{\mu }_{v} )}}{{\lambda_{mix,v}^{{}} }}^{2} } ,\) where \(\lambda_{v}\) is the eigenvalue for the v-th PCs

Step 4 Calculate the empirical density of \(\tilde{T}_{i}^{2}\) statistics, \(\hat{f}_{h} (\tilde{T}_{{}}^{2} ) = \frac{1}{{n\widehat{h}}}\sum\limits_{i = 1}^{n} {k\left( {\frac{{T_{{}}^{2} - \tilde{T}_{i}^{2} }}{{\widehat{h}}}} \right)}\), where \(\widehat{h}\) is the optimum bandwidth calculated using Botev, Grotowski, and Kroese algorithm 37

Step 5 Calculate the distribution function \(\tilde{T}_{i}^{2}\) statistics, \(\widehat{F}_{h}^{{}} (\widetilde{t}) = \int\limits_{0}^{{\widetilde{t}_{{}}^{2} }} {\hat{f}_{h} (\tilde{T}_{{}}^{2} )d} \tilde{T}_{{}}^{2}\)

Step 6 Calculate the KDE control limit \(\widetilde{CL} = \widehat{F}_{h}^{ - 1} (\widetilde{t})(1 - \alpha )\), when process is in-control

Step 7 Plot the \(\tilde{T}_{i}^{2}\) along with KDE control limit \(\widetilde{CL}\) to form the PCA Mix Control Chart

Performance in detecting outlier

The effectiveness of the proposed chart in identifying outliers mingled with the in-control data is demonstrated in this section. Simulated studies involving various situations are carried out to evaluate its performance. For the simulations, the variable characteristics are assumed to follow the multivariate normal distribution \({\mathbf{X}}_{1} \sim N_{p} ({\mathbf{0,I}})\), while the attribute characteristics are generated to follow the multinomial distribution with three categories \({\mathbf{X}}_{2} \sim M(\theta_{1} ,\theta_{2} ,\theta_{3} )\). Similar to Ahsan et al.5, the attribute characteristics are differentiated into three types such as the almost balanced proportion (\(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\)), the imbalanced proportion (\(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\)), and the extreme imbalanced proportion (\(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\)).

For the detailed performance, the number of attribute characteristics is evaluated for 2, 3, and 5. On the other hand, 5 variable characteristics is used with the number of observations n = 1000. The outliers mixed with the clean data are set to 5, 10, 20, 30, 40, and 50 percent out of the total observations. The proposed chart’s accuracy may be assessed using the confusion matrix by categorizing the findings into four groups: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) (FN). The examples that were successfully identified as outliers are denoted by the letters TP, TN, FP, and FN, whereas the instances that were wrongly identified as outliers and not outliers are denoted by the letters FN and FP. The hit rate (HR), which can be computed using Eq. (4), is the accuracy level employed.

$$ {\text{HR}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}. $$
(4)

False positive rate (FPR) and false negative rate (FNR) are two categories under which the mistake rate in a confusion matrix may be subdivided. The percentage of cases that are wrongly labeled as positive is known as the FPR, whereas the percentage of instances that are incorrectly classed as negative is known as the FNR. Equations (5) and (6), respectively, are used to determine the FPR and FNR formulas:

$$ {\text{FNR}} = \frac{{{\text{FN}}}}{{{\text{TP}} + {\text{FN}}}}, $$
(5)
$$ {\text{FPR}} = \frac{{{\text{FP}}}}{{{\text{TN}} + {\text{FP}}}}. $$
(6)

The detailed algorithm for simulation studies can be found in Ahsan et al.5.

Two attribute characteristics

Table 4 shows the performance of the proposed chart in detecting outliers for two attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4.\) In general, the proposed chart still has a stable performance for no more than 30 percent outlier added to the clean data. For this case, it can be seen that the misdetection occurs due to a large number of the in-control data declared as an outlier (high FP rate). The proposed chart performance in detecting outliers for two attribute characteristics with imbalanced proportion is reported in Table 5. Unlike the previous case (two variables with balanced proportion), the misdetections are caused by the inability of the control chart to capture the actual outliers, which can be seen from the high FN rate. Furthermore, Table 6 presents the performance of the proposed chart to detect outliers for the extreme imbalanced proportion (\(\theta_{1} ,\theta_{2} = 0.05{\text{ dan }}\theta_{3} = 0.9\)). For this condition, it can be seen that the high value of the FN rate causes a low level of accuracy in the proposed chart. In general, using the number of components l = 2 produces better results for this case.

Tabel 4 Performance of the proposed chart in identifying outliers for two attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\).
Table 5 Performance of the proposed chart in identifying outliers for two attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\)
Table 6 Performance of the proposed chart in identifying outliers for two attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\)

Three attribute characteristics

Proposed chart performance in outlier detection for three balanced attribute characteristics \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0,4\) is presented in Table 7. Similar to the two attribute characteristics case, for this case, the misdetection happens due to the high false alarm produced represented by the high value of FP rate. Tables 8 and 9 show the performance for three attribute characteristics with imbalanced and extreme imbalanced proportions, respectively. In this case, it can be seen that the misdetection for these two cases happens due to the actual outliers are failed to be detected, represented by the high value of the FN rate. From this case, it also can be seen that using smaller principal components produces better results. The performance degradation can be seen when the proposed chart monitors more than 30 percent of outliers. Also, the more imbalanced proportion of the attribute characteristics, the higher the accuracy level produced.

Table 7 Performance of the proposed chart in identifying outliers for three attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\).
Table 8 Performance of the proposed chart in identifying outliers for three attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\).
Tabel 9 Performance of the proposed chart in identifying outliers for three attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\).

Five attribute characteristics

Table 10 shows the outlier monitoring results for five attribute data with \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4.\) According to the simulation results, it can be concluded that, in this case, the misdetection occurs due to a large number of the in-control data declared as an outlier (see FP rate). The performances of the proposed chart for \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\) as well as \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\) are reported in Tables 11 and 12, respectively. Similar to the two previous cases, the failure to detect the actual outliers leads to reduced accuracy given by the proposed chart. In general, the usage of the smaller principal component leads to higher accuracy. This chart is still at its peak performance for less than 40 percent outlier mixed. Moreover, the more imbalanced proportion of the attribute characteristics monitored by the proposed chart, the higher the Hit rate or accuracy produced.

Table 10 Performance of the proposed chart in identifying outliers for five attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\).
Table11 Performance of the proposed chart in identifying outliers for five attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\).
Table 12 Performance of the proposed chart in identifying outliers for five attribute characteristics with \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\).

Based on the simulation results about the performance of the proposed chart in detecting outliers, the following findings can be written as follows:

  1. 1.

    In general, the proposed chart only has good capabilities when used to monitor data with 30 percent outliers.

  2. 2.

    When used to monitor attribute characteristics with balanced proportions, the chart’s performance decreases due to high false alarms or swamping effects.

  3. 3.

    When used to monitor attribute characteristics with imbalanced and extreme imbalanced, the proportion of diagram performance decreases due to high false negatives or masking effects.

  4. 4.

    The proposed chart is suitable for monitoring outliers in attribute data with imbalanced and extreme imbalance proportions.

Performance evaluation in monitoring process shift

This part evaluates the proposed chart’s effectiveness in order to inspect the process shift. Similar to the preceding part, attribute characteristics are created using a multinomial distribution with three different types of proportions, and variable characteristics are generated using a multivariate normal distribution. In this instance, the performance of the suggested chart is assessed for several types of shifts, such as a change in either variable characteristics, an attribute characteristics shift, or a shift in both variable and attribute characteristics. A new kind of correlation is tested to see how well the suggested chart performs. Using the same approach as Ahsan et al.4, the ARL1 is estimated by shifting the variable characteristics by \({{\varvec{\upmu}}}_{shift} = {{\varvec{\upmu}}} + {{\varvec{\updelta}}}_{\mu } ,\) where \({{\varvec{\updelta}}}_{\mu } = {\mathbf{0}}{\mathbf{.1}}\) and shifting the attribute characteristics by \({{\varvec{\uptheta}}}_{shift} = [\theta_{1} - \delta_{\theta } ;\;\theta_{2} - \delta_{\theta } ;\;\theta_{3} + 2\delta_{\theta } ],\) where \(\,\delta_{\theta } = 0.0025\).

Shift in variable characteristics

The proposed chart’s performance is shown in Tables 13, 14 and 15 for the balanced, imbalanced, and extremely imbalanced proportions of attribute data, respectively. In general, using the KDE control limit, the proposed chart produces ARL0 at around 370 for the false alarm rate \(\alpha = 0.00273\). For the shift in only variable characteristics, the proposed chart can capture the change in the process by producing the lower ARL1 for the larger shift given. For this case, better performance is achieved when it is used to monitor the balanced parameter of the attribute characteristics.

Table 13 ARLs for \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\) with shift in the variable characteristics for p = 5.
Table 14 ARLs for \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\) with shift in the variable characteristics for p = 5.
Table 15 ARLs for \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\) with shift in the variable characteristics for p = 5.

Shift in attribute characteristics

The performances of the proposed chart with the shift in the attribute characteristics for balanced, imbalanced, and extreme imbalanced proportion parameters are sequentially presented in Tables 16, 17 and 18. For this case, using the KDE control limit, it can be found that the performance of the proposed chart for the in-control state is stable (see the ARL0 value at around 370 for all scenarios \(\alpha = 0.00273\)). Although the proposed chart can capture process shifts that occur in the attribute characteristics, the ARL1 obtained does not drop as sharply as when detecting a shift in the variable characteristics. Also, the proposed chart performs better than existing chart, particularly when dealing with highly imbalanced data.

Table 16 ARLs for \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\) with shift in the attribute characteristics for p = 5.
Table 17 ARLs for \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\) with shift in the attribute characteristics for p = 5.
Table 18 ARLs for \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\) with shift in the attribute characteristics for p = 5.

Shift in variable and attribute characteristics

This subsection presents the performance of the proposed chart for detecting the shift in both variable and attribute characteristics. Table 19 reports the performance of the proposed chart for the balanced situation of attribute characteristics. Meanwhile, the proposed chart’s imbalanced and extreme performances are presented in Tables 20 and 21. From the results, it can be seen that there is a similarity performance with the performance of the proposed chart when it is used to monitor shifts in variable characteristics. The main difference in the performance lies in the type of shift. For small shifts, the proposed chart better monitors the shift in only variable characteristics. On the other hand, the shift in both variable and attribute characteristics produces better performance for the large shift.

Table 19 ARLs for \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\) with shift in the variable and attribute characteristics for p = 5.
Table 20 ARLs for \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\) with shift in the variable and attribute characteristics for p = 5.
Table 21 ARLs for \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\) with shift in the variable and attribute characteristics for p = 5.

Different correlation

This subsection presents the performance of the proposed chart for several coefficient correlations. In evaluating the performance of the proposed chart, the variable characteristics are generated with four types of correlation such as 0.3, 0.5, 0.7, and 0.9 using the KDE control limit. For this case, the process is shifted for both variable and attribute characteristics. The number of variable characteristics p is 5, and the number of principal components used l is 4. Also, the proposed chart is evaluated for three types of attribute characteristics as declared in the previous section.

Table 22 shows the performance of the proposed chart for monitoring the balanced proportion of attribute characteristics (\(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\)) with several types of correlation. The proposed chart always produces the ARL0 at about 370 for all scenarios for the in-control condition. The proposed chart can detect a shift when the process is shifted by producing smaller ARL1. For this case, better performance has achieved when the proposed chart monitors the process with a smaller coefficient correlation.

Table 22 ARLs of the proposed chart with p = 5, l = 4, \(\theta_{1} ,\theta_{2} = 0.3{\text{ and }}\theta_{3} = 0.4\) for various correlation.

Tables 23 and 24 reports the proposed chart’s performance in monitoring the attribute characteristics’ imbalanced and extreme imbalanced proportion. According to the tables, it can be concluded that for the in-control condition, the proposed chart produces the appropriate ARL0 (around 370 for \(\alpha = 0.00273\)). Similar to the previous result, the smaller coefficient correlation produces better performance, as seen from the ARL1 value for each scenario. In addition, the proposed chart reaches its peak performance when it is used in monitoring data in a balanced proportion of attribute characteristics.

Table 23 ARLs of the proposed chart with p = 5, l = 4, \(\theta_{1} ,\theta_{2} = 0.1{\text{ and }}\theta_{3} = 0.8\) for various correlation.
Table 24 ARLs of the proposed chart with p = 5, l = 4, \(\theta_{1} ,\theta_{2} = 0.05{\text{ and }}\theta_{3} = 0.9\) for various correlation.

Based on the simulation results about the performance of the proposed chart in monitoring process shift, the following findings can be summarized as follows:

  1. 1.

    The proposed chart is suitable for monitoring processes with shifts in variable characteristics and attribute characteristics with balanced proportions.

  2. 2.

    The proposed chart is suitable when used on quality characteristics of variables with low correlation and attribute characteristics with balanced proportions.

Application in the real cases

Machine failure dataset

This paragraph describes how the proposed chart is applied to a real-world scenario. The proposed chart is used to monitor of the machine failure dataset (attached as Excel file). This dataset have been used in Ref.4. There are 8784 samples in this dataset, along with 16 variable characteristics and 4 attribute qualities, one of which is labeling the observations. In this study, 8 out of 16 variable characteristics and 2 out of 3 attribute characteristics are chosen based on their mean deviation from the mean of the in-control process. While the second attribute characteristic contains four categories with a balanced percentage, the first attribute characteristic has eight with such ratio.

Table 25 shows the performance of the proposed chart in monitoring the Machine Failure dataset. According to the table, it can be seen that the performance of the multivariate based on the PCA Mix surpasses the performance of the conventional T2 chart. However, the PCA Mix chart with the F Distribution control limit has slightly better performance (see the Hit rate). Fortunately, the proposed chart demonstrates better performance than the other charts in detecting the real out-of-control observation. Based on the results, it can be seen that the proposed chart has better performance in detecting out-of-control signals compared to the others. This happened because the two attribute characteristics, which have a balanced proportion, increase the proposed method’s accuracy level.

Table 25 Proposed chart performance in monitoring the machine failure dataset.

NSL-KDD dataset

The well-known NSL-KDD dataset (available in https://www.kaggle.com/datasets/hassan06/nslkdd) is being monitored using the proposed chart in this section. It is regarded as a typical benchmark for assessing intrusion detection38. Table 26 details the proposed chart’s effectiveness in inspecting the NSL-KDD dataset. Based on the findings, we can see that the proposed chart performs better than the other charts. The proposed chart, which uses the KDE control limit, yields the highest hit rate and the lowest false positive rate.

Table 26 Proposed chart performance in monitoring the NSL-KDD dataset.

Conclusions

This paper presents the detailed performance evaluation of the PCA Mix control chart in monitoring the mixed variable and attribute quality characteristics. Through some simulation studies with several cases, the performance evaluation shows the PCA Mix chart’s ability to detect outliers and shifts in the process. The proposed chart still has a stable performance for no more than 30 percent outlier mixed. When the proposed chart is used to monitor more than one attribute characteristic with a balanced proportion, most misdetection occurs due to false alarms for more than 30 percent of outlier. On the other hand, in monitoring the attribute characteristics with imbalanced proportion, the proposed chart cannot detect actual outliers when it detects more than 30 percent of outliers. Furthermore, the performance of the proposed chart is also evaluated in detecting a shift in the process. The proposed chart shows an outstanding performance in monitoring the shift in only variable characteristics for the small shift in the process. The proposed chart demonstrated better performance for the shift in both variable and attribute characteristics for the large shift in the process. The proposed chart has a better performance in monitoring the smaller coefficient correlation. In addition, the proposed chart is also applied to monitor two datasets, and its performance is compared with the conventional method. The monitoring results show that compared to the other charts, the proposed chart has a higher accuracy detection by detecting more actual out-of-control observations with a low false alarm rate.

For future research, the performance of the proposed chart can be extended by adding some robust estimator in both the mean vector and covariance matrix. The bootstrap resampling method can be used to estimate the control limit of the proposed chart. The Squared Prediction Error (SPE) or Q statistic can be employed as an alternative for Hotelling’s T2 statistic in monitoring the mixed characteristics. Also, the effect of autocorrelation for the metric data is interesting issue need to be explored.

Ethics approval

This work does not involve experiments on animals and humans.