Calculus of a Function of Several Variables

Definition: Partial Derivatives

For a function with multiple input arguments, $z = f\left(x,y\right)$, the partial derivative of $f$ with respect to $x$ can be expressed as

$$ \dfrac{\partial f}{\partial x} = \lim_{h \rightarrow 0} \dfrac{f\left(x+h,y\right) - f\left(x,y\right) }{h} $$

similarly, the partial derivative with respect to $y$ can be expressed as

$$ \dfrac{\partial f}{\partial y} = \lim_{h \rightarrow 0} \dfrac{f\left(x,y+h\right) - f\left(x,y\right) }{h}. $$

Note that the partial derivative is often denoted as $\dfrac{\partial f}{\partial x} = f_x$.

Definition: Gradient of a Function

The gradient of a function $z=f\left(x_1, x_2, \ldots, x_n \right)$ is the row vector

$$ \nabla f = \left( \dfrac{\partial f}{\partial x_1}, \dfrac{\partial f}{\partial x_2}, \ldots, \dfrac{\partial f}{\partial x_n} \right). $$

As the gradient $\nabla f$ depends on the point at which it is evaluated, it is denoted by $\nabla f \left(x_1, x_2, \ldots, x_n \right)$.

The gradient is the analogue to the derivative, but, as a vector, has a direction.

Definition: Critical Point of a Function

If $f\left(x, y\right)$ has a local minima or maxima at $\left( x_0, y_0 \right)$, then $\nabla f\left( x_0, y_0 \right) = \vec{0}$.

Such points are called critical points.

Note that not all critical points are either maxima or minima. The classification of the critical points high-order derivatives.

Definition: Second Derivatives

Each partial derivative can be differentiated again to yield a second-order partial derivative,

$$ \dfrac{\partial^2 f}{\partial x_i \partial x_j} = \lim_{h \rightarrow 0} \dfrac{f_{x_i} \left(x_1, \ldots, x_j + h, \ldots, x_n \right) - f_{x_i}\left( x_1, \ldots, x_n \right)}{h}. $$

Thus, all second derivatives can be expressed using the Hessian matrix

$$ H\left( \vec{x} \right) = \left( \begin{array}{cccc} f_{x_1 x_1} & f_{x_1 x_2} & \cdots & f_{x_1 x_n} \\ f_{x_1 x_1} & f_{x_2 x_2} & \cdots & f_{x_2 x_n} \\ \vdots & \vdots & \ddots & \vdots \\ f_{x_n x_1} & f_{x_n x_2} & \cdots & f_{x_1 x_n} \end{array} \right). $$

If the second-order partial derivatives are continuous, then the Hessian matrix $H$ is symmetric. Then, if symmetric all eigenvalues are real-valued.

The Taylor expansion of a function of more than one variable about a critical point is given by

$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T H\left( \vec{a} \right) \vec{x} + \ldots $$

Note that the Hessian matrix can be factorized as

$$ H = Q D Q^{-1} $$

where $D$ is the diagonal matrix $\mathrm{diag}\left( \lambda_1, \ldots, \lambda_n \right)$ and $\lambda_i$ are the eigenvalues associated with eigenvector $\boldsymbol{q_i}$ of the Hessian matrix. The eigenvector $\boldsymbol{q_i}$ is the $i$th column of the matrix $Q$. As the eigenvectors are orthogonal, so $Q$ is an orthogonal matrix, thus $Q^{-1} = Q^T$. Hence,

$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T Q D Q^{T} \vec{x} + \ldots $$

Letting $\vec{y}=Q^T \vec{x}$, then

$$ \begin{align*} f\left(\vec{x} + \vec{a} \right) & = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{y}^T D \vec{y} + \ldots \\ & = f\left( \vec{a} \right) + \dfrac{1}{2} \left( \lambda_1^{} y_1^2 + \ldots \lambda_n^{} y_n^2 \right) + \ldots \end{align*} $$

So:

  1. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n > 0$, then the critical point is a local minima.
  2. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n < 0$, then the critical point is a local maxima.
  3. If some $\lambda_j < 0$, and some $\lambda_i > 0$, then the critical point is neither a local minima nor maxima. If none of the eigenvalues are equal to zero, then the critical point is called a saddle point.
  4. If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n \geq 0$ and at least is zero, or $\lambda_1$, $\lambda_2, \ldots \lambda_n, \leq 0$ and at least one is zero then the test is inconclusive, as the classification of the point depends on higher derivatives.

Jacobians

If the function outputs a vector, i.e. $\vec{f} \, : \, \mathbb{R}^n \mapsto \mathbb{R}^{m}$, write each component of $\vec{f}=\left( f_1, \ldots, f_m \right)$ and the same procedures can be performed on each component of the vector-valued function. Thus, a gradient can be computed for each component $\nabla f_i$, critical points must satisfy the vector equation $\vec{f}\left(\vec{x}^{*} \right) = \vec{0}$.

or for $f(u,v)$, let $u=g(x), v=h(x)$ then the function is a composition $F(x) = f\left( g(x),h(x) \right)$. Apply the chain rule to compute the gradient

$$ \nabla F = \nabla \left( f (g(x)), f(h(x)) \right) = \nabla f \dfrac{\partial (g,h)}{\partial x} $$


where

$$ \nabla f \dfrac{\partial (g,h)}{\partial x} = \left( \begin{array}{c} \nabla g \\ \nabla h \end{array} \right) $$

is the Jacobi matrix. Each row of the Jacobi matrix is a gradient of $g$ or $h$.

For a vector $\boldsymbol{w}$, the directional derivative of $f(\boldsymbol{x})$ in the direction of $\boldsymbol{w}$ is given by

$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \lim_{h \rightarrow 0} \dfrac{f(\boldsymbol{x} + h\boldsymbol{v}) - f(\boldsymbol{x})}{h} $$

If the function is differentiable, then

$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \nabla f \left(\boldsymbol{x}\right) \cdot \boldsymbol{w} $$

As, by convention, $\nabla f \left(\boldsymbol{x}\right)$ is a column vector.

The directional derivative can then be expressed as a matrix-vector product, specifically a Jacobian-vector product.

By the Cauchy-Schwarz inequality, the largest value of the directional derivative is when $\nabla f$ and $\boldsymbol{w}$ are pointing in the same direction.