Iterative Methods
Construct a scheme which solves the linear system $Ax=b$ by generating a sequence $\{ x^{(n)} \}$ which approximates the solution, $x$, that is
$$ \begin{equation} \lim_{n\rightarrow \infty} x^{(n)} = x. \nonumber \end{equation} $$So that $x = A^{-1} b$. Split the matrix $A=P-N$ and solve
$$ \begin{equation} P x^{(n+1)} = B x^{(n)} + f, \nonumber \end{equation} $$where $P$ is called a preconditioner and $B=P^{-1}N$ is the iteration matrix.
An equivalent way is to write
$$ \begin{equation} x^{(k+1)} = x^{(k)} + P^{-1}r^{(k)} \nonumber \end{equation} $$where
$$ \begin{equation} r^{(k)} = b - A x^{(k)} \nonumber \end{equation} $$is the residual.
An iterative method is consistent if $x=Bx +f$, or equivalently,
$$ \begin{equation} f = (I-B) A^{-1} b. \nonumber \end{equation} $$The formulation can be written as
$$ \begin{equation} x^{(0)} = F^{(0)} \left( A, b \right) \quad \textsf{and} \quad x^{(k+1)} = F^{(k+1)} \left( x^{(k)}, x^{(k-1)}, \ldots, x^{(0)}, A, b \right) . \nonumber \end{equation} $$If the functions $F^{(k)}$ are independent of the number of iterations, then it is said to be stationary.
Jacobi Method
The Jacobi method decomposes the matrix $A$ into diagonal, lower and upper triangular matrices $A=D+L+U$, and solves
$$ \begin{equation} D x^{(n+1)} = -(L + U) x^{(n)} + b . \tag{eq:JAC} \end{equation} $$Element-wise this is
$$ \begin{equation} x_i^{(k+1)} = \dfrac{1}{a_{ii}} \left( b_i - \sum\limits_{\substack{j=1,\newline{}j \neq i}}^n a_{ij} x_{j}^{(k)} \right) . \nonumber \end{equation} $$Thus, the iterative scheme is
$$ \begin{equation} x^{(n+1)} = -D^{-1}(L + U) x^{(n)} + D^{-1} b. \nonumber \end{equation} $$As $L+U = A-D$, so the iteration matrix can be written as $B=I - D^{-1}A$.
Over-Relaxation of Jacobi Method
Also called the weighted Jacobi method. Introduce $\omega$ to solve
$$ \begin{equation} x_i^{(k+1)} = \dfrac{ \omega }{ a_{ii} } \left( b_i - \sum\limits_{\substack{j=1,\newline{}j \neq i}}^n a_{ij} x_{j}^{(k)} \right) + \left( 1 - \omega \right) x^{(k)}. \tag{eq:JOR} \end{equation} $$Successive Over-Relaxation
Introduce $\omega$ to solve
$$ \begin{equation} \left(D + \omega L \right) x^{(n+1)} = -( (\omega-1)D + \omega U) x^{(n)} + \omega b. \tag{eq:SOR} \end{equation} $$Gauss-Seidel
The Gauss-Seidel method decomposes the matrix $A$ into diagonal, lower and upper triangular matrices $A=D+L+U$, and solves
$$ \begin{equation} (D + L) x^{(n+1)} = -U x^{(n)} + b \tag{eq:GS} \end{equation} $$If $A$ is strictly diagonally dominant by rows, i.e. $|a_{ii}| > \sum_{j \ne i} |a_{ij}|$, the Jacobi and Gauss-Seidel methods are convergent.
If $A$ and $2D-A$ are symmetric and positive definite, then the Jacobi method is convergent and the spectral radius of the iteration matrix $B$ is equal to
$$ \begin{equation} \rho \left( B \right) = \left\Vert B \right\Vert_{A} = \left\Vert B \right\Vert_{D} \nonumber \end{equation} $$where $\left\Vert \cdot \right\Vert_{A}$ is an energy norm which is induced by the vector norm $\left| x \right|_{A}~=~\sqrt{ x \cdot A x}$
If and only if $A$ is symmetric and positive definite, the Jacobi over-relaxation method is convergent if
$$ \begin{equation} 0 < \omega < \dfrac{2}{\rho\left( D^{-1}A \right) }. \nonumber \end{equation} $$If and only if $A$ is symmetric and positive definite, the Gauss-Seidel method is monotonically convergent with respect to the energy norm $\left\Vert \cdot \right\Vert_{A}$.
Gradient Descent
Consider the function $\Phi \left( y \right) \, : \, \mathbb{R}^{n} \mapsto \mathbb{R}$ which takes the form:
$$ \begin{equation} \Phi \left( y\right) = \dfrac{1}{2} y \cdot A y - y \cdot b. \nonumber \end{equation} $$It can be shown that solving $Ax=b$ is equivalent to minimizing $\Phi$.
If $x$ is a solution to the linear system and minimizes $\Phi(x)$ then $\nabla \Phi(x) = 0$, so that $Ax~-~b~ =~\nabla\Phi(x)~=0$.
Now
$$ \begin{align*} \Phi(y) & = \Phi(x + (y-x) ) \newline & = \Phi(x) + \dfrac{1}{2} \left\Vert y - x\right\Vert_{A}^{2}. \end{align*} $$Gradient descent seeks to construct a scheme which updates the vector $x^{(k)}$ according to
$$ \begin{equation} x^{(k+1)} = x^{(k)} + \alpha^{(k)} d^{(k)} \nonumber \end{equation} $$where $d^{(k)}$ is the update direction and $\alpha^{(k)}$ is the step size at the $k$-th iterate.
Note that in contrast to the methods above, the gradient descent method is non-stationary as values $d$ and $\alpha$ change at every iterate.
The idea is to let the search direction be the gradient of the function $\Phi$
$$ \begin{align*} d^{(k)} & = -\nabla \Phi \left( x^{(k)} \right) \newline & = - \left( A x^{(k)} - b \right) \newline & = b - A x^{(k)} \newline & = r^{(k)} \end{align*} $$The ideal step size is found by differentiating $\Phi$ with respect to $\alpha$ and setting this to zero, so that
$$ \begin{equation} \alpha^{(k)} = \dfrac{ r^{(k)} \cdot r^{(k)} }{ r^{(k)} \cdot A r^{(k)} } \nonumber \end{equation} $$If $A$ is symmetric and positive definite, then the gradient method method is convergent for any $x^{(0)}$ and
$$ \begin{equation} \left\Vert e^{(k+1)} \right\Vert_A \le \dfrac{K(A)-1}{K(A)+1} \left\Vert e^{(k)} \right\Vert_A.\nonumber \end{equation} $$Conjugate Gradient
Choosing $p^{(k+1)}$ such that
$$ \begin{equation} p^{(k+1)} \cdot A p^{(j)} = 0 \nonumber \end{equation} $$for $j=0, \ldots, k$ leads to
$$ \begin{equation} p^{(j)} \cdot r^{(k+1)} = 0.\nonumber \end{equation} $$Setting
$$ \begin{equation} \beta^{(k)} = \dfrac{r^{(k+1)} \cdot A p^{(k)} }{ p^{(k)} \cdot A p^{(k)} } \nonumber \end{equation} $$and
$$ \begin{equation} p^{(k+1)} = r^{(k+1)} - \beta^{(k)} p^{(k)} \nonumber \end{equation} $$then, for $j=0, \ldots, k,$ yields
$$ \begin{equation} p^{(k+1)} \cdot A p^{(j)} = 0.\nonumber \end{equation} $$