Eigenvalues & Principal Component Analysis

Eigenvalues & Eigenvectors

Definition: Eigenvalues and Eigenvectors
$$ A \boldsymbol {v} = \lambda \boldsymbol {v}. $$

then $\boldsymbol {v}$ is called an eigenvector of $A$, and $\lambda$ is the corresponding eigenvalue. Thus $A\boldsymbol{v}$ and $\boldsymbol{v}$ are collinear.

$$ A \boldsymbol {v} = \lambda \boldsymbol {v} \Rightarrow A \boldsymbol {v} - \lambda \boldsymbol {v} = \boldsymbol{0} $$$$ B \boldsymbol {v} = \boldsymbol{0} \quad \text{where} \quad B = A - \lambda I. $$

As $\boldsymbol{v}$ is not a zero vector and $B \boldsymbol{v} = \boldsymbol{0}$ then the determinant of $B$ is zero. Finding the roots to $\left| A- \lambda I \right|$ yields the eigenvalues, whose eigenvectors are in the span of the nullspace of $B=A-\lambda I$.

$$ A \boldsymbol {v} = X^{-1} D X \boldsymbol {v} $$

Thus powers of matrices, such as $A^k$ can be easily computed for any $k$.

  • If $A$ is triangular, its eigenvalues are the entries on the diagonal.
  • For an arbitrary $n$ by $n$ matrix $A$, the product of the $n$ eigenvalues is equal to the determinant of $A$.
  • The sum of the $n$ eigenvalues is equal to the trace of $A$.
Theorem: Eigenvalues and Eigenvectors of Symmetric Matrices
All eigenvalues of a symmetric matrix are real and positive. Furthermore, the eigenvectors can be chosen to be pairwise orthogonal.

Principal Component Analysis

Principal Component Analysis is a linear transformation of a dataset, $Z$ onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data.

Shift so that the mean of each column is zero, i.e. subtract the mean of each column of $Z$ from itself, yielding a new matrix $X$.

$X\boldsymbol{w}$ is the projection of each data row on the direction $w$.

$$ \begin{align*} \text{var} \, X & = \dfrac{1}{n-1} \left( x_{1}^{2} + \ldots + x_n^2 \right) \\ & = \dfrac{1}{n-1} \left( X \boldsymbol{w} \right)^T \left( X \boldsymbol{w} \right) \\ & = \dfrac{1}{n-1} \boldsymbol{w}^T X^T X \boldsymbol{w}. \end{align*} $$

Find the vector $\boldsymbol{w}$ so that variance is maximal.

Note that as ${A = X^T X}$ is symmetric, i.e. ${A^T = A}$, then all eigenvalues are real and there exists a orthonormal basis given by the eigenvectors, $\boldsymbol{q_i}$ of $A$. Let ${Q = \left( \boldsymbol{q_1} \, \boldsymbol{q_2} \cdots \boldsymbol{q_n} \right)}$, with ${Q^T = Q^{-1}}$ and ${D = \text{diag}\, {D} }$. Then $A$ can be expressed using the eigenvalues and eigenvectors, thus

$$ X^T X = A = Q D Q^T . $$$$ \begin{align*} \boldsymbol{w}^T X^T X \boldsymbol{w} & = \boldsymbol{w}^T A \boldsymbol{w} \\ & = \boldsymbol{w}^T Q D Q^T \boldsymbol{w} \\ & = \left( \boldsymbol{w}^T Q \right) D \left( Q^T \boldsymbol{w} \right) \\ & = \left( Q^T \boldsymbol{w} \right)^T D \left( Q^T \boldsymbol{w} \right), \quad \text{let} \quad \boldsymbol{y} = Q^T \boldsymbol{w} \\ & = \boldsymbol{y}^T D \boldsymbol{y}. \end{align*} $$

Since $\boldsymbol{w}$ is a unit vector and $Q$ is orthogonal, so $\boldsymbol{y}$ is also a unit vector. It can easily be shown that the vector ${\boldsymbol{y} = \left(1, 0 \ldots 0 \right)^T}$ maximizes the variance. Thus, the corresponding principal component $\boldsymbol{w}$ is recovered from ${\boldsymbol{y} = Q^T \boldsymbol{w}}$, i.e. ${\boldsymbol{w} = \left( Q^T\right)^{-1} \boldsymbol{y} = Q\boldsymbol{y}}$.

Example:

An .ipynb notebook with an example of principal component analysis can be accessed online [here].

It can be downloaded from [here] as a python file or downloaded as a notebook from [here].