Calculus of a Function of Several Variables

Definition: Partial Derivatives
$$ \dfrac{\partial f}{\partial x} = \lim_{h \rightarrow 0} \dfrac{f\left(x+h,y\right) - f\left(x,y\right) }{h} $$$$ \dfrac{\partial f}{\partial y} = \lim_{h \rightarrow 0} \dfrac{f\left(x,y+h\right) - f\left(x,y\right) }{h}. $$

Note that the partial derivative is often denoted as $\dfrac{\partial f}{\partial x} = f_x$.

Definition: Gradient of a Function
$$ \nabla f = \left( \dfrac{\partial f}{\partial x_1}, \dfrac{\partial f}{\partial x_2}, \ldots, \dfrac{\partial f}{\partial x_n} \right). $$

As the gradient $\nabla f$ depends on the point at which it is evaluated, it is denoted by $\nabla f \left(x_1, x_2, \ldots, x_n \right)$.

The gradient is the analogue to the derivative, but, as a vector, has a direction.

Definition: Critical Point of a Function

If $f\left(x, y\right)$ has a local minima or maxima at $\left( x_0, y_0 \right)$, then $\nabla f\left( x_0, y_0 \right) = \vec{0}$.

Such points are called critical points.

Note that not all critical points are either maxima or minima. The classification of the critical points high-order derivatives.

Definition: Second Derivatives
$$ \dfrac{\partial^2 f}{\partial x_i \partial x_j} = \lim_{h \rightarrow 0} \dfrac{f_{x_i} \left(x_1, \ldots, x_j + h, \ldots, x_n \right) - f_{x_i}\left( x_1, \ldots, x_n \right)}{h}. $$$$ H\left( \vec{x} \right) = \left( \begin{array}{cccc} f_{x_1 x_1} & f_{x_1 x_2} & \cdots & f_{x_1 x_n} \\ f_{x_1 x_1} & f_{x_2 x_2} & \cdots & f_{x_2 x_n} \\ \vdots & \vdots & \ddots & \vdots \\ f_{x_n x_1} & f_{x_n x_2} & \cdots & f_{x_1 x_n} \end{array} \right). $$

If the second-order partial derivatives are continuous, then the Hessian matrix $H$ is symmetric. Then, if symmetric all eigenvalues are real-valued.

$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T H\left( \vec{a} \right) \vec{x} + \ldots $$$$ H = Q D Q^{-1} $$$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T Q D Q^{T} \vec{x} + \ldots $$$$ \begin{align*} f\left(\vec{x} + \vec{a} \right) & = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{y}^T D \vec{y} + \ldots \\ & = f\left( \vec{a} \right) + \dfrac{1}{2} \left( \lambda_1^{} y_1^2 + \ldots \lambda_n^{} y_n^2 \right) + \ldots \end{align*} $$

So:

  1. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n > 0$, then the critical point is a local minima.
  2. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n < 0$, then the critical point is a local maxima.
  3. If some $\lambda_j < 0$, and some $\lambda_i > 0$, then the critical point is neither a local minima nor maxima. If none of the eigenvalues are equal to zero, then the critical point is called a saddle point.
  4. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n \geq 0$ and at least is zero, or $\lambda_1$, $\lambda_2, \ldots \lambda_n, \leq 0$ and at least one is zero then the test is inconclusive, as the classification of the point depends on higher derivatives.

Jacobians

If the function outputs a vector, i.e. $\vec{f} \, : \, \mathbb{R}^n \mapsto \mathbb{R}^{m}$, write each component of $\vec{f}=\left( f_1, \ldots, f_m \right)$ and the same procedures can be performed on each component of the vector-valued function. Thus, a gradient can be computed for each component $\nabla f_i$, critical points must satisfy the vector equation $\vec{f}\left(\vec{x}^{*} \right) = \vec{0}$.

$$ \nabla F = \nabla \left( f (g(x)), f(h(x)) \right) = \nabla f \dfrac{\partial (g,h)}{\partial x} $$$$ \nabla f \dfrac{\partial (g,h)}{\partial x} = \left( \begin{array}{c} \nabla g \\ \nabla h \end{array} \right) $$

is the Jacobi matrix. Each row of the Jacobi matrix is a gradient of $g$ or $h$.

$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \lim_{h \rightarrow 0} \dfrac{f(\boldsymbol{x} + h\boldsymbol{v}) - f(\boldsymbol{x})}{h} $$$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \nabla f \left(\boldsymbol{x}\right) \cdot \boldsymbol{w} $$

As, by convention, $\nabla f \left(\boldsymbol{x}\right)$ is a column vector.

The directional derivative can then be expressed as a matrix-vector product, specifically a Jacobian-vector product.

By the Cauchy-Schwarz inequality, the largest value of the directional derivative is when $\nabla f$ and $\boldsymbol{w}$ are pointing in the same direction.