Minimalist module analysis for fault detection and localization

Traditional multivariate statistical-based process monitoring (MSPM) methods are effective data-driven approaches for monitoring large-scale industrial processes, but have a shortcoming in handling the redundant correlations between process variables. To address this shortcoming, this study proposes a new MSPM method called minimalist module analysis (MMA). MMA divides process data into several different minimalist modules and one more independent module. All variables in the minimalist module are strongly correlated, and no redundant variables exist; therefore, the extracted feature components in one minimalist module will not be disturbed by noise from the other modules. This study also proposes new monitoring indices and a fault localization strategy for MMA, and simulation tests demonstrate that MMA achieves superior performance in fault detection and localization.

Multivariate statistical-based process monitoring (MSPM) methods 1-4 , e.g., principal component analysis (PCA) 5,6 , partial least squares (PLS) 7,8 , and canonical correlation analysis (CCA) 9,10 , are effective data-driven approaches for monitoring large-scale industrial processes. The main idea of MSPM is analyzing the correlation between process variables and extracting the feature components for the construction of statistical indices.
MSPM has been a research hotspot for many years, and a large number of relevant studies are published each year. In recent years, studies have focused on improving the existing methods to deal with process characteristics such as nonlinear, non-Gaussian, and dynamic features. For example, Ge et al. 11 combines the multivariate linear Gaussian state-space model with MSPM for handling the dynamic feature during a process; Du et al. 12 proposed the Gaussian distribution transformation (GDT)-based monitoring method for handling the non-Gaussian feature; and Lou et al. 13 combined artificial neural networks with PCA, and proposed a new neural component analysis for handling nonlinear features. Meanwhile, Zhou et al. 14 proposed a nonlinear key performance indicator (KPI) strategy for the PLS algorithm.
Because MSPM can compress the high-dimensional data into two or three statistical indices, it is a convenient tool for detecting the abnormal condition in the whole process object. To address the fault localization problem, the contribution plot method 15,16 was proposed for MSPM, which calculates the contribution of each variable of the original data set and picks the variables with high contributions as fault sources. Most studies on MSPM use the contribution plot as a basic algorithm tool 17,18 , and a few studies have proposed improved versions of the MSPM method that cannot use the traditional contribution plot directly (examples include the kernel PCA 19 and robust PCA 20 ).
However, according to actual simulation test results, MSPM is insensitive to specific faults, and the contribution plot method may mistakenly diagnose normal variables as a fault source. The reason for this phenomenon is that the traditional MSPM methods are based on the correlations between all process variables, and some correlations can be deduced by others, which means that these correlations are redundant. As such, the feature components extracted by traditional MSPM methods contain information from many process variables, and hence, are also disturbed by noises from these variables; therefore, traditional MSPM methods are insensitive to specific faults. In addition, the redundant correlations may mislead the contribution plot method, which results in incorrect localization of faults.
For handling these problems, multiblock MSPM methods, such as consensus PCA (CPCA) 21 , multiblock PLS (MBPLS) 18 , and hierarchical PLS (HPLS) 22 , are proposed for reducing the number of variables and improving the interpretability of multivariate models. The main idea of multiblock MSPM methods is dividing the process variables into several blocks and combining the monitoring result of each block. However, block division is still an open problem in academic and engineering fields. Though Slama had given a general guideline "blocks should correspond as closely as possible to distinct units of the process where all the variables within a block or process unit may be highly coupled, but where there is minimal coupling among variables in different blocks" 18

Methods
Principal component analysis (PCA). PCA decomposes the data matrix X ∈ R n×s (where n is the number of samples, and s is the number of variables) into a transformed k subspace of reduced dimensions as follows: where T ∈ R n×k refers to the score matrix, which is an orthogonal matrix; P ∈ R s×k refers to the loading matrix, and it is orthonormal; and E ∈ R n×s is the residual matrix. To obtain the loading matrix P , one should firstly calculate the covariance matrix: Then, can be presented by singular value decomposition (SVD) as follows: Matrix P is actually columns of P 0 associated with the k largest eigenvalues, and k is determined by cumulative percent variance (CPV) 27 as follows: where ε is a parameter usually set to 85%. When CPV is larger than ε , we take k as the number of the principal components (PCs). Then, two statistics are constructed to monitor the new process data sample x ∈ R 1×s as follows: ( 1 ≥ 2 ≥ · · · k ≥ 0) . The thresholds for the two indices, δ T 2 and δ SPE , can be found in reference 28 .
Contribution plot. The contributions to SPE are calculated as follows: where x j and x j are the jth columns of x and x , respectively. The contributions to T 2 are calculated as follows: where P i is the ith column of P, and P j,i is the element in the jth column and ith row. The role of the contribution plots to fault isolation is to indicate which of the variables are related to the fault rather than to reveal the actual cause of it. In general, variables with a higher contribution have a closer relationship with the fault source. The thresholds of and can be obtained by kernel density estimation 29 .

Drawback of PCA and contribution plot method. Theorem The redundant variables introduce extra noise into the principal components (PCs).
Proof Assume X 1 ∈ R n×s are the variables belonging to a minimalist module, which can be full-rank decomposed as where T 0 ∈ R n×s and P 0 ∈ R s×s Matrix X 2 ∈ R n×s ′ are the redundant variables that can be presented as the linear combination of X 1 as follows: where R ∈ R s×s′ . is the linear transformation matrix, and W ∈ R n×s′ is noise belonging to X 2 . In this paper, we assume that each measurement variable contains independent sensor noise, and hence, rank(W) = s′. www.nature.com/scientificreports/ Part T 0 P T 0 I R can be full-rank singular value decomposed as where T 1 ∈ R n×(s+s ′ ) , rank(T 1 ) = rank(T 0 ), and P 1 ∈ R (s+s′)×(s+s ′ ) . Hence, one obtains Because part T 1 +WP ′′ 1 is non-orthogonal in most situations, we introduce another orthonormal matrix Q ∈ R (s+s ′ )×(s+s ′ ) , which makes It should be noted that when T 1 +WP ′′ 1 is orthogonal, then Q = I. PCA picks the k largest components of T 2 as PCs, and we denote them as T k ∈ R n×k . Then, where Q k ∈ R (s+s ′ )×k is the corresponding k columns of Q . Taking =P ′′ 1 Q k ∈ R s ′ ×k , and because P ′′ and Q k are parts of orthonormal matrices P 1 and Q , one obtains = 0(rank( ) = 0 ) unless the exceptionally rare situation that all columns of Q k belong to the column set of P ′ T 1 . As rank(W)+rank( ) > s ′ , one obtains WP ′′ 1 Q k � = 0. As such, T k is influenced by W, and the redundant variables X 2 introduce extra noise W into the principal components (PCs). This finishes the proof. Based on the Theorem, one finds that PCA is not good at handling process data with redundant variables.
As for the contribution plot method, according to Eqs. (6) and (7), it is based on the difference between x and x . As shown in Fig. 2, when a fault occurs in a specific variable x j , (a) according to equation T=xP , the relevant principal components are faulty; (b) according to equation x=TP T , most reconstructed variables are faulty. As such, in a practical engineering application, it is hard to locate the source fault by the contribution plot method because too many variables' contribution indices alarm the fault. Section summary. In sum, to eliminate the noise disturbance in the redundant variables, and to improve the fault localization ability, we develop a new monitoring algorithm based on the minimalist module and propose a corresponding fault localization strategy in "Minimalist module analysis (MMA)" section.

Minimalist module analysis (MMA)
The content of this section is listed in Fig. 3 below.

Minimalist module division.
Traditional PCA approaches focus on the k largest eigenvalues in matrix , and the important information contained in the residual part is not used. When ε is very small (e.g., 0.05), one obtains j ≈ 0 j = k + 1.k + 2, . . . , s . Taking P r as the columns of P 0 associated with the s-k smallest eigenvalues, one obtains www.nature.com/scientificreports/ We assume X = x 1 x 2 x 3 , and P r = . Then, Through the transformation of Eq. (17), one obtains As such, one then obtains Unlike P r , some elements of P r are 0, and hence Eq. (19) can describe the relationship between x 2 and x 3 without considering x 1 . In Eq. (19), variable set x 2 x 3 is a minimalist module.
The flow of minimalist module division is as follows: (a) Find a transformation matrix Ŵ ∈ R (s−k)×(s−k) that maximizes the number of 0 elements in P r = P r Ŵ . This paper addresses this optimization problem by using the particle swarm optimization (PSO) 30 algorithm as described below.
Step 2 Take the num th column of P r as 1 and the remaining s − k − 1 columns as 2 . Solve the following optimization function by PSO: where �� 1 − � 2 Ŵ num � β denotes the number of elements in interval [−β, β] ( β is close to 0, such as 0.01).
Step 3 If num = s − k , go to step 4; else, num = num + 1 and go to step 2. Step (b) Calculate P r =P r Ŵ , adjust each column of P r to unit variance, and set all elements in interval [−ββ] to 0. (c) Take the variables corresponding to non-zero element parameters in the ith ( i = 1, 2, . . . , s − k ) column of P r as the ith minimalist module (MMi).  www.nature.com/scientificreports/ Remark The form of the minimalist module is not unique, e.g. through the transformation of Eq. (17), one also obtains. and hence variable set x 1 x 2 is also a minimalist module. As such, the result of PSO may be different each time.
Independent module. Each variable in the minimalist module is strongly correlated with other variables.
As such, some variables, such as x 8 and x 9 in Fig. 3, are not included in the minimalist module group. Thus, these variables belong to the independent module.

Monitoring indices construction.
Each minimalist module can be monitored by the PCA algorithm independently. We assume that X i ∈ R n×s are data belonging to MMi. Then, rank X i ∈s − 1 because each minimalist module represents one independent correlation, and hence the number of PCs for each minimalist module is fixed as s − where γ is a positive value (e.g., √ s − k ). As such, when some minimalist module detects the fault, then these two indices are much larger than their normal values. The threshold for both indices is s − k.
As for the variables in the independent module, they can be monitored by the T 2 index, which is denoted as T 2 I . For example, in the mathematical model in Fig. 3, one obtains SPE M 1 =(x 1 +x 2 − x 3 ) 2 ≈ 0 ; when the correlation between x 1 , x 2 , and x 3 changes to and SPE M i alarms the fault. (c) When a fault occurs in variables not belonging to the minimalist module, such as x 8 and x 9 , then they can only be handled with the detection result of the independent module, i.e., the contribution ConT 2 j .

Simulation study of MMA
This section aims to study the performance of MMA through simulation tests, and compare it with PCA and mutual information-multiblock PCA (MI-MBPCA) 31 . MI-MBPCA employs mutual information to divide the block automatically and hence it does not need the process prior knowledge for block division. The test model is shown below: x 1 P 1,1 +x 2 P 2,1 +x 3 P 3,1 P 3,2 − x 1 P 1,2 +x 2 P 2,2 +x 3 P 3,2 P 3,1 =x 1 P 1,1 P 3,2 −P 1,2 P 3,1 +x 2 P 2,1 P 3,2 −P 2,2 P 3,1 ≈ 0, After data normalization, the training data are adjusted to zero-mean and unit-variance. Then the normalized data are processed by MMA. The matrix P r is obtained as follows: . In this study, all control limits are based on a probability of 99% and the best result is marked in bold.
As shown in Table 1, the performance of MMA is better than that of PCA and MI-MBPCA for all five faults. Because MMA divides the whole process data into several minimalist modules and an independent module, and the noise in each variable will not disturb the unrelated modules, MMA is more robust to process noise than PCA. For MI-MBPCA, because each variable only belongs to one block and the rest blocks may lose key information, the models of blocks maybe biased. One interesting finding in Table 1 is that MMA can successfully detect faults 3 and 4 while PCA fails. The reason for this phenomenon is that PCA monitors the complex correlations between all variables together while MMA monitors each strong correlation (one minimalist module) independently; therefore, MMA is very sensitive to changes in specific correlations.
The fault localization results of the two algorithms for faults 3 and 5 are shown in Figs. 4 and 5, respectively. In Fig. 4, for PCA, ConSPE 3 , ConSPE 5 , and ConSPE 6 alarm the fault, and we cannot locate the fault source. For MI-MBPCA, because x 6 is influenced by x 5 , both variables alarm the fault and we cannot locate the fault source.

Fault detection in the Tennessee Eastman process
The Tennessee Eastman (TE) process 32 simulation is the most widely used simulation model to test the MSPM methods, which is outlined in Fig. 6. The TE process uses 12 manipulated variables, 22 continuous process measurements, and 19 composition measurements sampled less frequently to simulate a classical chemical process. Because the 19 composition measurements are difficult to measure in real time and one manipulated variable, i.e., the agitation speed, is not manipulated, this study only monitors the other 22 measurements and 11 manipulated variables, as listed in Table 2. Twenty-one programmed faults that are introduced in the TE process are listed in Table 3. In this study, 960 normal samples are adopted as training data to construct the monitoring models. Each testing data set contains 960 samples, and fault occurs at the 161st sample.
In this section, we compare MMA with PCA, MI-MBPCA, Deep principal component analysis (DePCA) 34 , and kernel dynamic PCA (KDPCA) 35 ; the latter two methods are improved versions of PCA. The detection results of the four methods are listed in Table 4. The false alarm rate is calculated as the the number of faults detected before 160 160 , and the detection rate is calculated as the number of faults detected between 161 and 960 800 . In this study, all control limits are based on a probability of 99% and the best result is marked in bold.
As shown in Table 4, we find that MMA, MI-MBPCA, and PCA achieve similar false alarm rates, and their values are much lower than those of the two improved PCA methods (over 10%). For fault detection rates, MMA achieves the best results in 17 of the 21 faults; as for the remaining 4 faults, MMA's detection rates are not as high as those of DePCA only because DePCA sacrifices the false alarm rate. An eye-catching result is obtained in the case of fault 5: the detection rates of the compared methods are generally below 50%, whereas MMA achieves a 100.0% detection rate, which indicates the superiority of MMA. In addition, the performance of MMA in faults 10, 16, 19, and 20 is much better than that of the other four methods.
As the papers that proposed DePCA and KDPCA did not give a description of the contribution plot construction, we only compare the fault localization ability between PCA, MI-MBPCA, and MMA. The matrix P r of MMA is shown in Table 5. Figure 7 shows the fault localization results of fault 4. According to Table 3, fault 4 is a step change in inlet temperature of reactor cooling water. As depicted in Fig. 6, the reactor temperature (variable 9 in Table 2) changes, and hence the reactor cooling water flow (variable 32 in Table 2) also changes to compensate for the

Conclusions
In this study, a new MSPM called MMA was proposed to overcome the shortcoming of the traditional MSPM method in handling the redundant correlations among process variables. The superiority of MMA was verified by several simulation tests. It achieved much better detection performance for five different types of faults on a mathematical model test, and two of which could not be detected by PCA and MI-MBPCA. MMA also had a better performance than other improved MSPM algorithms for 17 of the 21 faults in the Tennessee Eastman process.