Principal Component Analysis

Principal component analysis (PCA) tries to find an orthogonal linear projection that projects the data into a novel coordinate system, in which the greatest variance by some scalar projection of the data lies on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA is widely used as a tool for dimensionality reduction and data visualization. In this chapter, we will first give a brief review of the theoretical derivation of PCA from two commonly used definitions: maximizing variance and minimizing residuals. And then, we show how PCA arises naturally as the maximum likelihood solution to a particular form of a linear-Gaussian latent variable model, which is called probabilistic principal component analysis. Finally, through case studies, we show how PCA can be used in different applications. The first simple case shows how to use PCA in dimension reduction and data visualization. The second case shows PCA-based feature extraction in fault detection for process monitoring.


Introduc on
Principal component analysis (PCA) (Hotelling, 1933;Pearson, 1901) is a dimension reduc on and decorrela on technique that transforms a correlated mul variate distribu on into orthogonal linear combina ons of the original variables. PCA is a useful geosta s cal modeling tool for two primary reasons: 1. Mul variate data, consis ng of mul ple correlated geological variables, are transformed by PCA to be uncorrelated. Independent geosta s cal modeling of the decorrelated variables then proceeds, before the PCA back-transform restores the original correla on to the modeled variables. 2. PCA may be used for dimension reduc on in the above framework. Independent geostas cal modeling proceeds on a subset of the decorrelated variables, before the PCA backtransform provides models of all original variables.
PCA could be used to gain a deeper understanding of underlying latent factors, but in geosta scs these two reasons prevail. It was first applied to geosta s cal modeling in this manner by (Davis & Greenes, 1983), with more recent examples from (Barne & Deutsch, 2012) and (Boisvert, Rossi, Ehrig, & Deutsch, 2013). This lesson begins with a descrip on of the data processing and covariance calcula ons that are necessary prior to applying PCA. Essen al PCA theory is then outlined and demonstrated with a small example, before demonstra ng it with a larger geochemical dataset.

Data Pre-processing and Covariance Calcula on
Consider k geological variables Z 1 , . . . , Z k that will be simulated across a sta onary domain A. Condi oning data is given as the matrix Z : z α,i , α = 1, . . . , n, i = 1, . . . , k, where n is the number of samples. The Z data is assumed to be representa ve of the domain A, so that parameters may be calculated experimentally. Variables must be transformed to have a mean of zero (termed centered) before applying PCA or any linear rota on. It is also recommended that the variables be transformed to have variance of one, as this improves the interpretability of subsequent PCA results. Standardiza on of the geological variables is therefore used as a pre-processor to PCA: . . . , n, i =, 1, . . . , k where µ i = 1/n ∑ n α=1 z α,i is the mean of Z i and σ 2 i = 1/n ∑ n α=1 z 2 α,i − µ 2 i is the variance of Z i . Each standardized Y i variable has a mean of zero and a variance of one. PCA revolves around the covariance matrix Σ of the Y data, which is calculated as: The Σ values parameterize the mul variate system of the Y data in terms of linear variability and dependence. Diagonal entries C i,i are the variance of each Y i . Off-diagonal entries C i,j , i ̸ = j are the covariance between Y i and Y j . These covariances are also correla ons since each Y i has a variance of one.
PCA results are subject to the accuracy of Σ. If the calculated sample Σ is not representa ve of the true popula on covariances, then PCA will not make the popula on uncorrelated in reality. For example, the covariance calcula on is very sensi ve to outlier values. Careful exploratory analysis should be performed to detect and remove erroneous outliers from the data prior to the covariance calcula on.
The familiar normal score transform may be considered in the place of standardiza on, as normal scores have a mean of zero, variance of one and no univariate outliers. This will likely improve robustness of the covariance calcula on and linear rota on, although mul variate outliers may persist to have adverse consequences. The normal score transform is non-linear, which has implica ons for es ma on. Back-transforming normal score es mates directly will introduce a bias, although transforming a series of quan les, as in PostMG (Lyster & Deutsch, 2004), solves this issue.

PCA Transform
The first step of PCA performs spectral decomposi on of Σ, yielding the eigenvector matrix V : v i,j , i, j = 1, . . . , k and the diagonal eigenvalue matrix D : d i,i , i = 1, . . . , k: The PCA transform is then performed through the matrix mul plica on of Y and V: This rotates the mul variate data so that the resultant principal components in P are uncorrelated. Mul plying P by the transpose of V rotates the data back to Y, providing the back-transform that may be used for simulated realiza ons of the principal components: The eigenvector matrix may be thought of as a rota on matrix, providing a new basis where the correlated data are made orthogonal. The linear matrix mul plica on of Y 1 , . . . , Y k with the i th column of V provides the P i principle component. Hence, each principal component is a linear combina on of the original variables, explaining the nature of the linear rota on terminology.
Each d i,i entry corresponds with the variance of P i , while also measuring the variability that P i explains about the Y 1 , . . . , Y k mul variate system. More specifically, the percentage variability The component P 1 explains the most variability, P 2 explains the second most, and so on.
PCA is demonstrated using a small k = 3 example. The sca er plot of the Y 1 , . . . , Y 3 data is overlain with the P 1 , . . . , P 3 principal component vectors, which correspond with each column of V and display the rota on basis (e.g., axes of principal components). The vector lengths are scaled according to the associated eigenvalues, which are also displayed in the bar chart below.
Following transforma on, the below sca er plot displays P data in the rotated basis, where the greatest variance exists visibly in the P 1 dimension. Sca er plots in this lesson are colored according to their associated Y 3 value, which indicates how each data point is rotated and shi ed by the

Dimension Reduc on
Since eigenvalues measure the variability that each principal component contributes to the original mul variate system, prac oners may consider discarding insignificant components from subsequent geosta s cal modeling. This is not u lized o en in prac ce, but is available when an infeasibly large number of variables must be modeled. Consider that the l most important principal components are selected for simula on across N model nodes, where l < k. Le ng the resul ng realiza on values be the N xl matrix P ′ , the PCA back-transform is simply modified by mul plying P ′ with the l rows of V T . The mul plica on of these N x l and lXk matrices yields the N x k matrix Y of the standardized variable realiza ons. The effec veness of this dimension reduc on scheme relates to the magnitude of variance that the removed principal components explain. If the associated eigenvalues are vanishingly small, then removal of those principal components should not have a significant impact on simula on results. The figure below demonstrates the PCA back-transform of l = 1 and l = 2 of the k = 3 components. Rather than a simulated realiza on, the transformed P data is simply being back-transformed. been explained by the removed principal component(s). In this case, each principal component explains a significant amount of variability, so that the impact of their removal is substan al. Smaller eigenvalues can be expected as k grows larger, making the use of dimension reduc on more effecve. It is interes ng to note that Y 1 reproduc on is virtually iden cal whether using l = 1 or l = 2 principal components, whereas the reproduc on of Y 2 and Y 3 is significantly improved. This relates to the nature of the rota on and how the original variables are loaded onto the principal components. A loading ρ ′ (Y i , P j ) describes how important the P j principal component is for characterizing the Y i variability. It is calculated as: This shows that a loading is the product of eigenvectors v i,j and eigenvalues d i,i , though it may be more intui vely thought of as the correla on ρ between the original and transformed variables, scaled by the standard devia on of Y i . When working with standardized data, as we are here, a loading is simply the correla on between the Y i original variable and the P j principal component, ρ ′ (Y i , P j ) = ρ(Y i , P j ). Inspec ng the loadings of this transforma on, observe that P 2 is virtually uncorrelated with Y 1 . That is why the results above show that the inclusion and exclusion of P 2 in the back-transform yields virtually iden cal results for Y 1 . All of the original variables are loaded most heavily on Y 1 , which is expected since it explains the majority of their variability.

Geochemical Example
A geochemical dataset provides a more compelling example of PCA in terms of poten al dimension reduc on and exploratory analysis. This public data was collected by the Northwest Territories Geological Survey in partnership with the Geological Survey of Canada. It includes n = 1660 stream sediment samples that provide k = 53 elements, which were collected in mineral deposit explora on across the Mackenzie Mountains. A er standardizing the elements, the covariance matrix of the resul ng Y data is calculated and displayed below.
Spectral decomposi on is applied to the covariance matrix, genera ng the eigenvalues displayed below. The explained variability of the principal components is then calculated from the eigenvalues, which is displayed in an incremental and cumula ve manner. The cumula ve plot is some mes Figure 5: Sca er between the true and back-transformed values using two (above) and one (below) principal components. referred to as a scree or elbow plot. It is a useful tool, par cularly if a visible elbow or inflec on exists, where principal components begin explaining insignificant variability. A slight elbow exists here a er the third or fourth component, though users may consider modeling addi onal components based on a required threshold of explained variability; say the 29 components that are required here to explain 95% of the variability.
As expected based on the eigenvalues, most elements are only loaded strongly onto the first few principal components (matrix below). Consider that the volume of informa on in the covariance and loadings matricres above creates challenges for interpre ng the overall mul variate system. A common exploratory analysis approach for simplifying the mul variate system and understanding the underlying latent variables, involves plo ng the loadings of select principal components against each other, such as the first two principal components that are plo ed below. Elements located in closely proximity are closely related, and vice versa. For example, consider that Ca and Mg have the largest P1 values (furthest right on the plot), while having very small P2 values. Their variability is largely explained by P1, but not P2. The two elements are located very near to each other and rela vely far from other elements, which corresponds with the covariance matrix, where Ca and MgO are highly correlated with each other, and strongly nega vely correlated with the most of the other elements. As only the first two principal component loadings are displayed, note that this is a simplified projec on of the mul variate system, which only explains 49% of the variability.

Summary
PCA is a useful tool for mul variate geosta s cal modeling. Geological variables are decorrelated to facilitate independent modeling, before the back-transform restores the original correla on to modeled variables. When the number of variables becomes imprac cal to model, the dimension reduc on func onality of PCA may be used for modeling a subset of variables, before the backtransform provides models of all variables. It may also be applied for exploratory data analysis, providing insight into the underlying latent variables that explain a high dimensional mul variate system.
There are alterna ve linear decorrela on transforma ons that are immediate extensions of PCA, which may offer advantages to geosta s cal modeling. These include data sphereing and minimum/maximum autocorrela on factors, which are the focus of a companion lesson.