Introduction

The research on the anomaly detection of hyperspectral data has drawn much attention recently in many fields1,2,3. The so-called anomaly detection is basically to find out “abnormal” pixels from an image where the targets and their associated background are both unknown. Many anomaly detection methods have been proposed, among which RX Detector (RXD)4,5,6 is the most typical one. It has been applied to both multi and hyper-spectral successfully in terms of anomaly detection. In fact, the expression of RXD is equivalent to the Mahalanobis distance. There are many anomaly detection operators derived from RXD, such as modified RX (MRX), normalized RX (NRX), weighted RX, Causal RX7 and adaptive causal anomaly detector algorithm (ACAD)8. The low probability detector (LPD)9 is another anomaly detector used frequently. LPD determines whether a pixel is abnormal or not according to the relationship between any pixel of the image and the unity vector multiplied by the inversion of the sample auto-correlation matrix. The uniform target detector (UTD) is an evolved version of LPD which has a translational shift of the origin of the image to the mean vector. Kwon10 proposed a new anomaly detection method, dual window-based eigen separation transform anomaly detector (DWEST). DWEST model involves two local windows, namely inner and outer windows, which are designed to maximize the separation between anomalies and background. The inner window is used to detect the anomalies presented in it, while the outer window is used to model the background of the anomalies. By moving these two local windows in an image, we can calculate the local mean and covariance matrix for each window and their differences. Consequently, anomalies can be extracted by projecting the differential mean between two windows onto the eigenvector associated with the largest positive eigenvalue of the differential covariance matrix. Similar to DWEST, nested spatial window-based target detector (NSWTD) is presented in11. NSWTD model involves three nested local windows, namely, inner, middle and outer windows. The first two windows are used to extract the smallest and largest anomalies respectively, while the outer window is used to model the local background. Moreover, the other key difference of this model from the DWEST and RX-based algorithms is to use the orthogonal projection divergence (OPD) instead of eigenvector projection or sample covariance matrix as a measurement. Based on a nonparametric model, the combined F-Test anomaly detector (CFT) is presented by Rosario12. The main assumption of this method has an asymptotic behavior of Fisher's F distribution for data sets which are examined by a common statistical test. Some other anomaly detection methods can be seen at13,14,15.

The anomaly detectors mentioned above are all basically conducted from the statistical perspective. The statistics used include first-order statistics (e.g., mean vector) and second-order statistics (e.g., covariance matrix). It is not sufficient to use the first-order and the second-order statistics since the distributions of scatter points in feature space for most real images are not normal distributions.

In this study, the third-order statistical tensor is introduced to extract anomalies. In fact, there are many approaches in existing literature for anomaly detection using high-order statistics16,17. However, all these approaches are the direct applications of the blind signal separation (BSS) methods (e.g., FastICA), which generally involve step-by-step iterations to reach the optimal solution. As a result, they are apt to be trapped in local minima. In order to address the convergence issue, Geng18 introduced the concept of coskewness tensor to hyperspectral data analysis and proposed a target detection method based on higher order singular value detection (HOSVD). Nevertheless, both the BSSS-based techniques and HOSVD are in the domain of feature extraction, aiming at the extraction of not only the anomalies but also the other independent components in the image.

In this paper, combining the concept of third-order statistical tensor and the idea of RXD, we present a new anomaly detection method termed coskewness tensor detector (COSD). The proposed method can directly get the distribution of the anomaly of a hyperspectral image without any iteration, which can therefore avoid problems of the BSS-based methods.

Results

Although there are many anomaly detection algorithms based on the 2nd-order statistic, they are all generally derived from RXD. Similarly, we can also derive corresponding algorithms based on COSD. Therefore, the experiments in this study only focus on comparing the performances between COSD, FastICA and RXD. In order to facilitate the comparison between COSD and FastICA, we used skewness as a measure of non-Gaussianity in FastICA. In addition, since FastICA can produce a lot of independent components, we just select the one with the greatest skewness as the detection result.

Evaluation with simulated data

The simulated image of two bands with 50 × 50 pixels was first used in this experiment. Simulation data consists of two parts, abnormal targets (8*8) and their background. The background pixels fit a Gaussian distribution. The target was located in the upper left corner of the image and randomly scattered outside the background in the feature space of the image (see Fig. 1).

Figure 1
figure 1

The scatter plot of the simulated data.

The comparisons between RXD, FastICA and COSD are given in Fig. 2. By a visual comparative analysis, the performance of COSD is superior to that of RXD. Fig. 3 shows the receiver operating characteristic (ROC) curve of detection rate versus false alarm rate for the three algorithms (see reference [15] for their definitions of detection rate and false alarm rate). Clearly, the detection performance of COSD is comparable to that of FastICA and both are better than that of RXD.

Figure 2
figure 2

Anomaly detection result.

(a) RXD; (b). FastICA; (c). COSD.

Figure 3
figure 3

ROC curves.

Evaluation with real hyperspectral data

The hyperspectral data of 100*100 pixels from OMIS-II (Operational Modular Imaging Spectrometer) is used to test these methods. The hyperspectral imaging system was developed by Shanghai Institute of Technical Physic, Chinese Academy of Sciences (SITP). The data, which was acquired by the Aerial Photogrammetry and Remote Sensing Bureau in Xi'an, China in 2003, includes 64 bands from visible to thermal infrared with 3.6 m spatial resolution and 10 nm spectral resolution in the visible and near infrared region (60 bands). There were small man-made targets simulated as common objects within the scene, which were distributed at two locations around the top right corner of the image (marked by the rectangles in Fig. 4 a), consisting of tens of pixels. From the true color (approximately) composition image (see Fig. 4 a), it is hard to find any information of the targets in the rectangles.

Figure 4
figure 4

Anomaly detection results for the real hyperspectral image: (a). True true color composition image, (b). RXD, (c). COSD, (d). NCOSD.

From Fig. 4b, we can see two small bright blocks at the top right corner, which are the man-made targets. It indicates that RXD can distinguish them as the abnormal pixels from the background. Fig. 4c is the result of COSD, where the two man-made targets are highlighted significantly as the abnormal pixels while the rest is greatly suppressed as the background. The result of NCOSD in Fig. 4d indicates that it is good enough to extract anomalies by only using the skewness information. Fig. 5 shows the ROC (receiver operating characteristic) curve of detection rate versus the false alarm rate for both RXD and COSD. It illustrates that the detection capability of COSD has a significant advantage over that of RXD.

Figure 5
figure 5

ROC curves.

Now we turn to the comparison between FastICA and COSD. Usually, FastICA gets all the independent components through iterations and the one with the maximum skewness is chosen as the anomaly detection result. However, due to the local optimum, the first independent component of FastICA is not always corresponding to the global maximum skewness, thus different surface objects may be detected (see Fig. 6, which shows the inconsistency of anomaly detections. Besides, the skewness-based FastICA does not have a global convergence. Nevertheless, our COSD does not have these problems.

Figure 6
figure 6

The different results of the first independent component by FastICA.

Discussion

In this study we proposed a new method of using high-order statistic tensor to detect the anomaly of a hyperspectral image and analyzed in detail the application of the skewness tensor in the anomaly detection. Compared to the traditional methods based on the second-order statistics, COSD has a better capacity to extract the abnormal objects. Moreover, COSD can directly get the distribution of the abnormal objects by using a higher-order statistic tensor, compared to the traditional methods based on blind signal separation methods. Since COSD does not need iteration, it can avoid the shortcomings of the blind signal separation methods. By the experiment with simulated data, it shows that the detection performance of the COSD is better than that of RXD. In the experiment using real hyperspectral data, it is illustrated that the COSD can highlight the man-made targets as the anomalies out of the image successfully. It is noteworthy that, the obtained abnormal pixels might not be the ones of interest due to the uncertainty of the abnormal pixels in an image. However, the introduction of a higher-order statistic tensor will benefit a lot in the anomaly detection for hyperspectral images.

Although the introduction of coskewness tensor benefits anomaly detection a lot, COSD may suffer from larger computational complexity. Figure 7 shows the computational complexity (measured by the required float operations, flops) for RXD and COSD. Assume the size of the hyperspectral data is N pixels and L bands. The flops required for RXD is while that for COSD is . We can see that from Fig. 7, COSD is more sensitive to the number of bands. When L is relatively large (for instance, >50), the computational complexity of COSD is about times of that of RXD.

Figure 7
figure 7

The computational complexity of RXD and COSD versus (a) the number of bands (N = 200*200) and (b) the number of pixels (L = 50).

It is noticeable that since the COSD method can be considered as the extension of the RXD method in formula expression from the 2nd-order statistics (covariance matrix) to the 3rd-order statistics (coskewness tensor), all the other 2nd-order statistics based anomaly detection methods (such as modified RX, weighted RX, causal RX, DWEST, NSWTD) can be simply extended to those 3rd-order statistics based ones or even higher-order statistics based ones. The advantage of our COSD algorithm in detecting anomaly of hyperspectral image ensures a rationality of this extension.

In conclusion, the anomalies generally show strong features in the high-order statistics. Thus, this paper presents a new anomaly detection method COSD based on third-order statistical tensor. Formally, the COSD is the natural extension of RXD from second-order to third-order statistics. Essentially, the COSD take full advantage of angle information, which ensures the validity of COSD.

Methods

RXD

The RXD is a detector proposed by Reed and Yu4. For each pixel vector r in an image, RXD can be implemented by a operator specified by

where is the mean vector of the image, K is the sample covariance matrix of the image and N is the number of pixels. δRXD(r) in Eq.(1) has the same form as Mahalanobis distance. The covariance matrix K can be decomposed as: K = EDET, where D = diag1, λ2,…, λL}; E is the eigenvectors matrix of K. We denotes as a whitening operator of the image. Then eq.(1) can be transformed as

From Eq. (2) we can see that, δRXD(r) is actually the Euclidean distance of r and μ of the whitened image.

Tensor introduction

A real m-order n-dimensional tensor consists of nm real entries19, represented as where ij = 1,…,n for j = 1,…, m. Fig. 8 shows an example of third-order 4-dimensional tensor. The tensor is supersymmetric if its entries are invariant under any permutation of their indices20, or mathematically, aijk = aikj = ajik = ajki = akij = akji.

Figure 8
figure 8

A sketch map of third-order tensor with size of 4 × 4 × 4.

The tensor defines an mth-degree homogeneous polynomial

where x = [x1, …, xn]T, xm is a tensor with m orders, n dimension and rank being 120 and its elements are respectively where ij = 1, …, n for j = 1,…, m. is the tensor product of and xm19. For example, when m = 2, is a matrix of n*n and . For m-order tensors, can be decomposed in m steps as following:

where ×i denotes the i-mode product operator. Fig. 9 shows the explanation of the multiplication of a third-way tensor and a vector which yields a scalar. As will be seen later, that scalar is the corresponding skewness in the direction x if is the coskewness tensor.

Figure 9
figure 9

Illustration of n-mode product of a 3-way tensor and a vector.

For a hyperspectral image data set S = {r1,…, rN}, its m-order cumulant matrix (tensor) is defined as:

where ri is the spectral column vector of an image; is a m-order L-dimensional tensor (where L is the number of bands in the image) with rank being 1. Obviously, the m-order statistical tensor is a supersymmetric tensor. This paper will focus on the research of anomaly extraction for the hyperspectral image by using the high-order statistical tensor.

coskewness tensor detector (COSD)

For a hyperspectral image data set , suppose that and , where I is L*L unit matrix. It means that each band of the image has a variance of 1 and the correlation coefficient between the bands is zero. That is to say, the hyperspectral data has been normalized. It is not difficult to reach these two conditions. If the mean vector of the image is not zero, it can be achieved by moving the origin of the image to the mean vector. Besides, the real hyperspectral image can meet the second condition by data whitening.

Here, we propose a new anomaly detector, named high-order statistic detector (HOSD), which is defined as follows:

where is a high-order statistic tensor of the image defined as Eq.(5); r is the pixel vector of the image. Similar to Eq.(3), is a scalar which is the tensor product of the m-order statistic tensor and . In this paper, we will discuss a case where m = 3. And then Eq.(5) can be transformed as

which can be called as coskewness tensor detector (COSD) and here is the coskewness tensor. Apparently the coskewness tensor is a supersymmetric third-order tensor with the dimension of L × L × L.

Like RXD, the coskewness tensor based anomaly detector also uses all pixels of the image by eq. (7) to get a gray image, where the anomalies of the image will appear to be very bright or dark. The dark pixels in the gray image are caused by the negative values from Eq. (7). Therefore, we usually determine anomaly pixel by using the absolute value of the gray image.

In Eq. (7), if is a unit vector, then is the skewness of the image against the direction of . Eq.(7) can be transformed as follows

From Eq. (8), we can see that the anomaly extraction using Eq. (7) is mainly dependent on two indices: One is the skewness and the other is the cube of the 2-norms. If we eliminate the item from Eq.(8), the COSD operator becomes a normalized COSD (NCOSD) operator

Practically, is the skewness of image data against the direction of .

Let us assume that hyperspectral image is composed of background and abnormal pixels,

where is the whitened hyperspectral image; is background (Nb is the number of background pixels); and is anomaly (Na is the number of abnormal pixels). It is obvious that N = Nb + Na. The coskewness tensor of the image can be transformed as

where is composed of background pixels and is composed of abnormal pixels. In this study, we just discuss the case that only one class of anomaly lies in the image and denote the spectrum of anomaly as , Then we have

and the coskewness tensor of the image can be transformed as:

In general, the number of abnormal pixels in an image is very small, thus μb ≈ μ = 0, where is the mean vector of the background image. Accordingly, can be approximately considered as the coskewness tensor of the background image. If we assume that the background image fits a Gaussian distribution, we can get

It means that all the elements of are close to zero. Thus the skewness of the image in the direction of can be expressed as

where θ is the angle between the vector and . Considering eq.(15), eq.(8) can be rewritten as

Since is an image after centering and whitening, the RX operator can be expressed as

From eq.(1517), it can be seen that there are distinguished differences among RX, NCOSD and COSD. Specifically, RXD depends only on the distance between a pixel and the origin in the feature space of the whitened image. Only when abnormal pixels are far away from the origin and all the background pixels are relatively close to the origin, RXD can achieve a good anomaly detection result. NCOSD is based on the skewness of the image and its detection performance is dependent on angles between an abnormal pixel and all the background pixels. When all the angles are large, NCOSD can get a good detection result. As for COSD, it does not only take the distance into account, but also the angle. So it can overcome shortcomings of both RXD and NCOSD, both of which focus only on one single index.