Iterative Methods

Construct a scheme which solves the linear system $Ax=b$ by generating a sequence $\{ x^{(n)} \}$ which approximates the solution, $x$, that is

$$ \begin{equation} \lim_{n\rightarrow \infty} x^{(n)} = x. \nonumber \end{equation} $$

So that $x = A^{-1} b$. Split the matrix $A=P-N$ and solve

$$ \begin{equation} P x^{(n+1)} = B x^{(n)} + f, \nonumber \end{equation} $$

where $P$ is called a preconditioner and $B=P^{-1}N$ is the iteration matrix.

An equivalent way is to write

$$ \begin{equation} x^{(k+1)} = x^{(k)} + P^{-1}r^{(k)} \nonumber \end{equation} $$

where

$$ \begin{equation} r^{(k)} = b - A x^{(k)} \nonumber \end{equation} $$

is the residual.

Definition: Consistency

An iterative method is consistent if $x=Bx +f$, or equivalently,

$$ \begin{equation} f = (I-B) A^{-1} b. \nonumber \end{equation} $$
Theorem:
If an iterative scheme is consistent, then if and only if $\rho \left( B \right) < 1$ the method will converge for any initial guess $x^{(0)}$.
Definition: Stationary Methods

The formulation can be written as

$$ \begin{equation} x^{(0)} = F^{(0)} \left( A, b \right) \quad \textsf{and} \quad x^{(k+1)} = F^{(k+1)} \left( x^{(k)}, x^{(k-1)}, \ldots, x^{(0)}, A, b \right) . \nonumber \end{equation} $$

If the functions $F^{(k)}$ are independent of the number of iterations, then it is said to be stationary.

Jacobi Method

The Jacobi method decomposes the matrix $A$ into diagonal, lower and upper triangular matrices $A=D+L+U$, and solves

$$ \begin{equation} D x^{(n+1)} = -(L + U) x^{(n)} + b . \tag{eq:JAC} \end{equation} $$

Element-wise this is

$$ \begin{equation} x_i^{(k+1)} = \dfrac{1}{a_{ii}} \left( b_i - \sum\limits_{\substack{j=1,\newline{}j \neq i}}^n a_{ij} x_{j}^{(k)} \right) . \nonumber \end{equation} $$

Thus, the iterative scheme is

$$ \begin{equation} x^{(n+1)} = -D^{-1}(L + U) x^{(n)} + D^{-1} b. \nonumber \end{equation} $$

As $L+U = A-D$, so the iteration matrix can be written as $B=I - D^{-1}A$.

Over-Relaxation of Jacobi Method

Also called the weighted Jacobi method. Introduce $\omega$ to solve

$$ \begin{equation} x_i^{(k+1)} = \dfrac{ \omega }{ a_{ii} } \left( b_i - \sum\limits_{\substack{j=1,\newline{}j \neq i}}^n a_{ij} x_{j}^{(k)} \right) + \left( 1 - \omega \right) x^{(k)}. \tag{eq:JOR} \end{equation} $$

Successive Over-Relaxation

Introduce $\omega$ to solve

$$ \begin{equation} \left(D + \omega L \right) x^{(n+1)} = -( (\omega-1)D + \omega U) x^{(n)} + \omega b. \tag{eq:SOR} \end{equation} $$

Gauss-Seidel

The Gauss-Seidel method decomposes the matrix $A$ into diagonal, lower and upper triangular matrices $A=D+L+U$, and solves

$$ \begin{equation} (D + L) x^{(n+1)} = -U x^{(n)} + b \tag{eq:GS} \end{equation} $$
Theorem:
  1. If $A$ is strictly diagonally dominant by rows, i.e. $|a_{ii}| > \sum_{j \ne i} |a_{ij}|$, the Jacobi and Gauss-Seidel methods are convergent.

  2. If $A$ and $2D-A$ are symmetric and positive definite, then the Jacobi method is convergent and the spectral radius of the iteration matrix $B$ is equal to

    $$ \begin{equation} \rho \left( B \right) = \left\Vert B \right\Vert_{A} = \left\Vert B \right\Vert_{D} \nonumber \end{equation} $$

    where $\left\Vert \cdot \right\Vert_{A}$ is an energy norm which is induced by the vector norm $\left| x \right|_{A}~=~\sqrt{ x \cdot A x}$

  3. If and only if $A$ is symmetric and positive definite, the Jacobi over-relaxation method is convergent if

    $$ \begin{equation} 0 < \omega < \dfrac{2}{\rho\left( D^{-1}A \right) }. \nonumber \end{equation} $$
  4. If and only if $A$ is symmetric and positive definite, the Gauss-Seidel method is monotonically convergent with respect to the energy norm $\left\Vert \cdot \right\Vert_{A}$.

Theorem:
For any $\omega \in \mathbb{R}$ we have $\rho \left( B \left( \omega \right) \right) \ge |\omega - 1|$. Thus, SOR does not converge if either $\omega \le 0$ or $\omega \ge 2$.
Theorem: Ostrowski
If $A$ is symmetric and positive definite, then the SOR method is convergent if and only if $0 < \omega < 2$. Furthermore, the convergence is monotonic with respect to the energy norm $\left\Vert \cdot \right\Vert_{A}$.

Gradient Descent

Consider the function $\Phi \left( y \right) \, : \, \mathbb{R}^{n} \mapsto \mathbb{R}$ which takes the form:

$$ \begin{equation} \Phi \left( y\right) = \dfrac{1}{2} y \cdot A y - y \cdot b. \nonumber \end{equation} $$

It can be shown that solving $Ax=b$ is equivalent to minimizing $\Phi$.

If $x$ is a solution to the linear system and minimizes $\Phi(x)$ then $\nabla \Phi(x) = 0$, so that $Ax~-~b~ =~\nabla\Phi(x)~=0$.

Now

$$ \begin{align*} \Phi(y) & = \Phi(x + (y-x) ) \newline & = \Phi(x) + \dfrac{1}{2} \left\Vert y - x\right\Vert_{A}^{2}. \end{align*} $$

Gradient descent seeks to construct a scheme which updates the vector $x^{(k)}$ according to

$$ \begin{equation} x^{(k+1)} = x^{(k)} + \alpha^{(k)} d^{(k)} \nonumber \end{equation} $$

where $d^{(k)}$ is the update direction and $\alpha^{(k)}$ is the step size at the $k$-th iterate.

Note that in contrast to the methods above, the gradient descent method is non-stationary as values $d$ and $\alpha$ change at every iterate.

The idea is to let the search direction be the gradient of the function $\Phi$

$$ \begin{align*} d^{(k)} & = -\nabla \Phi \left( x^{(k)} \right) \newline & = - \left( A x^{(k)} - b \right) \newline & = b - A x^{(k)} \newline & = r^{(k)} \end{align*} $$

The ideal step size is found by differentiating $\Phi$ with respect to $\alpha$ and setting this to zero, so that

$$ \begin{equation} \alpha^{(k)} = \dfrac{ r^{(k)} \cdot r^{(k)} }{ r^{(k)} \cdot A r^{(k)} } \nonumber \end{equation} $$
Theorem:

If $A$ is symmetric and positive definite, then the gradient method method is convergent for any $x^{(0)}$ and

$$ \begin{equation} \left\Vert e^{(k+1)} \right\Vert_A \le \dfrac{K(A)-1}{K(A)+1} \left\Vert e^{(k)} \right\Vert_A.\nonumber \end{equation} $$

Conjugate Gradient

Definition: Conjugate Vectors
If $A$ is symmetric and positive definite, let the vectors $u$ and $v$ be $\mathbf{A}$-orthogonal or conjugate if $u \cdot Av=0$.
Lemma:

Choosing $p^{(k+1)}$ such that

$$ \begin{equation} p^{(k+1)} \cdot A p^{(j)} = 0 \nonumber \end{equation} $$

for $j=0, \ldots, k$ leads to

$$ \begin{equation} p^{(j)} \cdot r^{(k+1)} = 0.\nonumber \end{equation} $$
Lemma:

Setting

$$ \begin{equation} \beta^{(k)} = \dfrac{r^{(k+1)} \cdot A p^{(k)} }{ p^{(k)} \cdot A p^{(k)} } \nonumber \end{equation} $$

and

$$ \begin{equation} p^{(k+1)} = r^{(k+1)} - \beta^{(k)} p^{(k)} \nonumber \end{equation} $$

then, for $j=0, \ldots, k,$ yields

$$ \begin{equation} p^{(k+1)} \cdot A p^{(j)} = 0.\nonumber \end{equation} $$
Theorem:
If $A \in \mathbb{R}^{n \times n}$ is a symmetric and positive definite matrix, and $b \in \mathbb{R}^{n}$, then the conjugate gradient method yields the exact solution of $Ax=b$ after $n$ steps.