Calculus of a Function of Several Variables
Note that the partial derivative is often denoted as $\dfrac{\partial f}{\partial x} = f_x$.
As the gradient $\nabla f$ depends on the point at which it is evaluated, it is denoted by $\nabla f \left(x_1, x_2, \ldots, x_n \right)$.
The gradient is the analogue to the derivative, but, as a vector, has a direction.
If $f\left(x, y\right)$ has a local minima or maxima at $\left( x_0, y_0 \right)$, then $\nabla f\left( x_0, y_0 \right) = \vec{0}$.
Such points are called critical points.
Note that not all critical points are either maxima or minima. The classification of the critical points high-order derivatives.
If the second-order partial derivatives are continuous, then the Hessian matrix $H$ is symmetric. Then, if symmetric all eigenvalues are real-valued.
$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T H\left( \vec{a} \right) \vec{x} + \ldots $$$$ H = Q D Q^{-1} $$$$ f\left(\vec{x} + \vec{a} \right) = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{x}^T Q D Q^{T} \vec{x} + \ldots $$$$ \begin{align*} f\left(\vec{x} + \vec{a} \right) & = f\left( \vec{a} \right) + \dfrac{1}{2} \vec{y}^T D \vec{y} + \ldots \\ & = f\left( \vec{a} \right) + \dfrac{1}{2} \left( \lambda_1^{} y_1^2 + \ldots \lambda_n^{} y_n^2 \right) + \ldots \end{align*} $$So:
- If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n > 0$, then the critical point is a local minima.
- If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n < 0$, then the critical point is a local maxima.
- If some $\lambda_j < 0$, and some $\lambda_i > 0$, then the critical point is neither a local minima nor maxima. If none of the eigenvalues are equal to zero, then the critical point is called a saddle point.
- If all $\lambda_1$, $\lambda_2, \ldots, \lambda_n \geq 0$ and at least is zero, or $\lambda_1$, $\lambda_2, \ldots \lambda_n, \leq 0$ and at least one is zero then the test is inconclusive, as the classification of the point depends on higher derivatives.
Jacobians
If the function outputs a vector, i.e. $\vec{f} \, : \, \mathbb{R}^n \mapsto \mathbb{R}^{m}$, write each component of $\vec{f}=\left( f_1, \ldots, f_m \right)$ and the same procedures can be performed on each component of the vector-valued function. Thus, a gradient can be computed for each component $\nabla f_i$, critical points must satisfy the vector equation $\vec{f}\left(\vec{x}^{*} \right) = \vec{0}$.
$$ \nabla F = \nabla \left( f (g(x)), f(h(x)) \right) = \nabla f \dfrac{\partial (g,h)}{\partial x} $$$$ \nabla f \dfrac{\partial (g,h)}{\partial x} = \left( \begin{array}{c} \nabla g \\ \nabla h \end{array} \right) $$is the Jacobi matrix. Each row of the Jacobi matrix is a gradient of $g$ or $h$.
$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \lim_{h \rightarrow 0} \dfrac{f(\boldsymbol{x} + h\boldsymbol{v}) - f(\boldsymbol{x})}{h} $$$$ \nabla_{\boldsymbol{w}} f \left(\boldsymbol{x}\right) = \nabla f \left(\boldsymbol{x}\right) \cdot \boldsymbol{w} $$As, by convention, $\nabla f \left(\boldsymbol{x}\right)$ is a column vector.
The directional derivative can then be expressed as a matrix-vector product, specifically a Jacobian-vector product.
By the Cauchy-Schwarz inequality, the largest value of the directional derivative is when $\nabla f$ and $\boldsymbol{w}$ are pointing in the same direction.