Nonlinear Equations

Root Finding

Bisection Method

Definition: Bisection Method

The bisection method, when applied in the interval $\left[a, b\right]$ to a function $f \in C^0 \left( \left[a, b\right] \right)$ with ${f(a)f(b) < 0}$

Bisect the interval into two subintervals $\left[a, c\right]$ and $\left[c, b\right]$ such that ${a < c < b}$.

  • If $f(c) = 0$ or is sufficiently close, then $c$ is a root
  • Else, if $f(c)f(a) < 0$ continue in the interval $[a, c]$
  • Else, if $f(c)f(b) < 0$ continue in the interval $[c, b]$
Theorem: Bisection Method

The bisection method, when applied in the interval $[a,b]$ to a function $f \in C^0 \left( \left[a, b\right] \right)$ with ${f(a)f(b) < 0}$ will compute, after $n$ steps, an approximation $c_n$ of the root $r$ with error

$$ \left\| r - c_n \right\| \lt \dfrac{b-a}{2n}. $$

Newton’s Method

Definition:

Let a function $f \in C^1 \left( \left[a, b \right] \right)$, then for an initial guess $x_0$, Newton’s method is

$$ x_{n+1} = x_n - \dfrac{f\left( x_n \right)}{f\left( x_n \right)}. $$
Theorem:
When Newton’s method converges, it converges to a root, $r$, of $f$, i.e. ${f\left(r\right)=0}$.
Theorem:

Let $f \in C^1\left( \left[ a, b\right] \right)$, with

  1. $f(a)f(b) < 0$
  2. $f^\prime\left(x\right) \ne 0$ for all $x \in \left( a, b \right)$
  3. $f^{\prime\prime}\left(x\right) $ exists, is continuous and either ${f^{\prime\prime}\left(x\right) > 0}$ or ${f^{\prime\prime}\left(x\right) < 0}$ for all ${x \in \left( a, b \right)}$.

Then $f(x)=0$ has exactly one root, $r$, in the interval and the sequence generated by Newton iterations converges to the root when the initial guess is chosen according to

  • if $f(a) < 0$ and $f^{\prime\prime}(a) < 0$ or $f(a) > 0$ and $f^{\prime\prime}(a) > 0$ then $x \in \left[ a, r \right]$

or

  • if $f(a) < 0$ and $f^{\prime\prime}(a) > 0$ or $f(a) > 0$ and $f^{\prime\prime}(a) < 0$ then $x \in \left[ r, b \right]$.

The iterate in the sequence satisfies

$$ \left\| x_n - r\right\| \lt \dfrac{f\left( x_n \right)}{\min \limits_{x \in [a,b]}\left\| f^{\prime}\left( x \right) \right\|}. $$
Theorem:

Let $f \in C^1\left( \left[ a, b\right] \right)$, with

  1. $f(a)f(b) < 0$
  2. $f^\prime\left( x \right) \ne 0$ for all $x \in \left( a, b \right)$
  3. $f^{\prime\prime}\left( x \right)$ exists and is continuous, i.e. ${f\left( x \right) \in C^2\left( \left[ a, b \right]\right)}$

Then, if $x_0$ is close enough to the root $r$, Newton’s method converges quadratically.

Secant Methods

Definition:

The secant method is defined as

$$ x_{n+1} = x_n - f \left( x_n \right) \dfrac{x_{n-1} - x_n}{f \left( x_{n-1} \right) - f \left( x_n \right)}. $$
Theorem:

Let $f \in C^2\left( \left[a, b\right] \right)$, and $r \in \left(a,b\right)$ such that ${f\left(r\right) = 0}$ and ${f^{\prime}\left(r\right) \ne 0}$. Furthermore, let

$$ x_{n+1} = x_n - f \left( x_n \right) \dfrac{x_{n-1} - x_n}{f \left( x_{n-1} \right) - f \left( x_n \right)}. $$

Then there exists a $\delta > 0$ such that when ${\left\| r - x_0 \right\| < \delta}$ and ${\left\| r - x_1 \right\| < \delta}$, then the following holds:

  1. $\lim\limits_{n\rightarrow\infty} \left\| r - x_n \right\| = 0 \Leftrightarrow \lim\limits_{n\rightarrow\infty} x_n = r$, and
  2. $\left\| r - x_{n+1} \right\| \le \mu \left\| r - x_{n} \right\|^{\alpha}$ with ${\alpha =\dfrac{1+\sqrt{5}}{2} }.$
Example:

An .ipynb notebook with an example of iterative solvers for nonlinear systems can be accessed online [here].

It can be downloaded from [here] as a python file or downloaded as a notebook from [here].

Example:

An .ipynb notebook with an example using the secant method to find a local maxima can be accessed online [here].

It can be downloaded from [here] as a python file or downloaded as a notebook from [here].

Convergence

Definition:

If a sequence $x_n$ converges to $r$ as $n \rightarrow \infty$, then it is said to converge linearly if there exists a $\mu \in \left( 0,1\right)$ such that

$$ \lim\limits_{n\rightarrow\infty} \dfrac{\left\| x_{n+1} - r \right\|}{\left\| x_{n} - r \right\|} = \mu. $$

The sequences converges super-linearly if

$$ \lim\limits_{n\rightarrow\infty} \dfrac{\left\| x_{n+1} - r \right\|}{\left\| x_{n} - r \right\|} = 0 $$

and sub-linearly if

$$ \lim\limits_{n\rightarrow\infty} \dfrac{\left\| x_{n+1} - r \right\|}{\left\| x_{n} - r \right\|} = 1. $$

More generally, a sequence converges with order $q$ if there exists a ${\mu \gt 0}$ such that

$$ \lim\limits_{n\rightarrow\infty} \dfrac{\left\| x_{n+1} - r \right\|}{\left\| x_{n} - r \right\|^q } = \mu $$

Thus a sequence is said to converge quadratically when ${q=2}$ and exhibit cubic convergence when $q=3$.

Method Regularity Proximity to $r$ Initial points Function calls Convergence
Bisection $\mathcal{C}^{0}$ No 2 1 Linear
Newton $\mathcal{C}^{2}$ Yes 1 2 Quadratic
Secant $\mathcal{C}^{2}$ Yes 1 1 Superlinear

Systems of Nonlinear Equation

Definition: Multi-dimensional Newton Method

For a vector-valued function $f : \mathbb{R}^n \rightarrow \mathbb{R}^n$, which takes as an argument the vector

$$ x= \left( x_1, x_2 \ldots, x_n \right) \in \mathbb{R}^n $$

the Jacobian matrix is defined as

$$ \begin{equation*} J = \left( \begin{array}{cccc} \dfrac{\partial f_1 }{\partial x_1} & \dfrac{\partial f_1 }{\partial x_2} & \cdots & \dfrac{\partial f_1 }{\partial x_n} \\\ \dfrac{\partial f_2 }{\partial x_1} & & & \vdots \\\ \vdots & & & \vdots \\\ \dfrac{\partial f_n }{\partial x_1} & \dfrac{\partial f_n }{\partial x_2} & \cdots & \dfrac{\partial f_n }{\partial x_n} \end{array} \right) \in \mathbb{R}^{n \times n}. \end{equation*} $$

If the derivatives are evaluated at the vector $x$, the Jacobian matrix can be parameterised as $J\left(x\right)$. Newton’s method can be written as

$$ x_{m+1} = x_m - J^{-1} \left( x_m \right) f\left( x_m \right) $$

where $J^{-1} \left( x_m \right)$ is the inverse of the Jacobian matrix evaluated at the vector $x_m$, which is the $m$th iterate of the scheme. In practice, as matrix inversion can be computationally expensive, the system

$$ J \left( x_m \right) \left( x_{m+1} - x_m \right) = -f \left( x_m \right) $$

is solved for the unknown vector $x_{m+1} - x_m$ and then $x_{m+1}$ is found.