Eigenvalues & Principal Component Analysis
Eigenvalues & Eigenvectors
In linear algebra, an eigenvector is a non-zero vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector, $\boldsymbol{v}$, of a linear transformation, $A$, is scaled by a constant factor, $\lambda$, when the linear transformation is applied to it:
$$ A \boldsymbol {v} = \lambda \boldsymbol {v}. $$then $\boldsymbol {v}$ is called an eigenvector of $A$, and $\lambda$ is the corresponding eigenvalue. Thus $A\boldsymbol{v}$ and $\boldsymbol{v}$ are collinear.
i.e.
$$ B \boldsymbol {v} = \boldsymbol{0} \quad \textsf{where} \quad B = A - \lambda I. $$As $\boldsymbol{v}$ is not a zero vector and $B \boldsymbol{v} = \boldsymbol{0}$ then the determinant of $B$ is zero. Finding the roots to $\left| A- \lambda I \right|$ yields the eigenvalues, whose eigenvectors are in the span of the nullspace of $B=A-\lambda I$.
$$ A \boldsymbol {v} = X^{-1} D X \boldsymbol {v} $$Thus powers of matrices, such as $A^k$ can be easily computed for any $k$.
- If $A$ is triangular, its eigenvalues are the entries on the diagonal.
- For an arbitrary $n$ by $n$ matrix $A$, the product of the $n$ eigenvalues is equal to the determinant of $A$.
- The sum of the $n$ eigenvalues is equal to the trace of $A$.
Principal Component Analysis
Principal Component Analysis is a linear transformation of a dataset, $Z$ onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data.
Shift so that the mean of each column is zero, i.e. subtract the mean of each column of $Z$ from itself, yielding a new matrix $X$.
$X\boldsymbol{w}$ is the projection of each data row on the direction $w$.
Given the mean of each column is zero, so the variance of the set of column vectors is given by
$$ \begin{align*} \text{var} \, X & = \dfrac{1}{n-1} \left( x_{1}^{2} + \ldots + x_n^2 \right) \\ & = \dfrac{1}{n-1} \left( X \boldsymbol{w} \right)^T \left( X \boldsymbol{w} \right) \\ & = \dfrac{1}{n-1} \boldsymbol{w}^T X^T X \boldsymbol{w}. \end{align*} $$Find the vector $\boldsymbol{w}$ so that variance is maximal.
Note that as ${A = X^T X}$ is symmetric, i.e. ${A^T = A}$, then all eigenvalues are real and there exists a orthonormal basis given by the eigenvectors, $\boldsymbol{q_i}$ of $A$. Let ${Q = \left( \boldsymbol{q_1} \, \boldsymbol{q_2} \cdots \boldsymbol{q_n} \right)}$, with ${Q^T = Q^{-1}}$ and ${D = \text{diag}\, {D} }$. Then $A$ can be expressed using the eigenvalues and eigenvectors, thus
$$ X^T X = A = Q D Q^T . $$Then, the expression for the variance is given by
$$ \begin{align*} \boldsymbol{w}^T X^T X \boldsymbol{w} & = \boldsymbol{w}^T A \boldsymbol{w} \\ & = \boldsymbol{w}^T Q D Q^T \boldsymbol{w} \\ & = \left( \boldsymbol{w}^T Q \right) D \left( Q^T \boldsymbol{w} \right) \\ & = \left( Q^T \boldsymbol{w} \right)^T D \left( Q^T \boldsymbol{w} \right), \quad \text{let} \quad \boldsymbol{y} = Q^T \boldsymbol{w} \\ & = \boldsymbol{y}^T D \boldsymbol{y}. \end{align*} $$Since $\boldsymbol{w}$ is a unit vector and $Q$ is orthogonal, so $\boldsymbol{y}$ is also a unit vector. It can easily be shown that the vector ${\boldsymbol{y} = \left(1, 0 \ldots 0 \right)^T}$ maximizes the variance. Thus, the corresponding principal component $\boldsymbol{w}$ is recovered from ${\boldsymbol{y} = Q^T \boldsymbol{w}}$, i.e. ${\boldsymbol{w} = \left( Q^T\right)^{-1} \boldsymbol{y} = Q\boldsymbol{y}}$.