Number Representations
Errors
There are many different sources of errors which can affect numerical methods, such as
- Errors in data
- Round-off errors
- Truncation errors
Let $\tilde{a}$ be an approximation to $a$, then the absolute error is given by $\left| {\tilde{a}-a} \right|$.
If ${\left| a \ne 0\right|}$, the relative error is given by $\left| \dfrac{\tilde{a}-a}{a} \right|$ and the error bound is the magnitude of the admissible error.
For both addition and subtraction the bounds for the absolute error are added.
In division and multiplication the bounds for the relative errors are added.
If $y(x)$ is a smooth function, i.e. is differentiable, then $\left\| y^{\prime} \right\|$ can be interpreted as the linear sensitivity of $y(x)$ to uncertainties in $x$.
$$ \left\| \Delta y \right\| \le \sum\limits_{i=1}^{n}\left\| \dfrac{\partial y}{\partial x_i} \right\|\left\| \Delta x_i \right\| $$where $\left\| \Delta x_i \right\| = \tilde{x}_i - x_i$ for an approximation $\tilde{x}_i$, thus ${\left\| \Delta y_i \right\| = \tilde{y}_i - y_i = f \left( \tilde{x}_i \right) - \left( x_i \right)}$.
Number Representations
A number can be written in a nested form:
$$ \begin{align*} \left(x\right)\_b & = a_0 b^0 + a_1 b^1 + \ldots + a_n b^n \\\ & = a_0 + b\left( a_1 + b\left( a_2 + b\left( a_3 + \ldots + b a_n\right) \right. \ldots \right) \end{align*} $$with $a_i \in \mathbb{N}_0$ and $a_i < b$, i.e. ${a_i \in \lbrace 0, \ldots, b-1 \rbrace }$.
For a real number, ${x \in \mathbb{R}}$, write
$$ \begin{align*} x & = \sum \limits_{i=0}^{n} a_i b^i + \sum \limits_{i=1}^{\infty} \alpha_i b^{-i} \\\ & = a_n \ldots a_0 \cdot \alpha_1 \alpha_2 \ldots \end{align*} $$Euclid’s algorithm can convert number $x$ in base 10, i.e. $\left(x \right)_{10}$ into another base, $b$, i.e. $\left(x \right)_{b}$.
- Input $\left(x \right)_{10}$
- Determine the smallest integer $n$ such that ${x < b^{n+1}}$
- Let $y=x$. Then for $i=n, \ldots, 0$ $$ \begin{array}{rcl} a_i & = & y \text{ div } b^i \\\ y & = & y \text{ mod } b^i \end{array} $$ which at each steps provides an $a_i$ and updates $y$.
- Output as $\left(x \right)\_{b} = a_n a_{n-1} \cdots a_0$
- Input $\left(x \right)_{10}$
- Set $i=0$
- Let $y=x$. Then while $y>0$ $$ \begin{array}{rcl} a_i & = & y \text{ mod } b \\\ y & = & y \text{ div } b \\\ i & = & i+1 \end{array} $$ which at each steps provides an $a_i$ and updates $y$.
- Output as $\left(x \right)_{b} = a_n a_{n-1} \cdots a_0$
where $\sigma$ is the sign of the number, i.e. $\pm 1$, $f$ is the mantissa, $b$ is the base, $t$ is the shifted exponent (where $p$ is the shift required to determine the actual exponent.)
$$x= \pm 0 \cdot a_1 \ldots a_k \times b^n$$where the $a_i \in \lbrace 0, 1, \ldots b-1 \rbrace$ are called the digits, $k$ is the precision and $n$ is the exponent. The set $a_1, \ldots, a_k$ is called the mantissa. Impose that $a_1 \ne 0$, it makes the representation unique.
Alternative conventions may express $x= \pm a_1 \cdot a_2 \ldots a_{k-1} \times b^n$, with $a_1 \ne 0$. Normalization refers to specifying the digits before the decimal place.
then, at most $p$ and at least $q$ significant bits (i.e. significant figures written in base 2) are lost during subtraction.