Higher order derivatives

Mixed partials

Given a function $f \colon \R^d \to \R$, $i \in \set{1, \dots, d}$, treat $\partial_i f \colon \R^d \to \R$ as a function. If $\partial_i f$ is itself a differentiable function, we can differentiate it again.

Definition 1. Let $k \in \N$, $f \colon \R^d \to \R$, $i_1, \dots, i_k \in \set{1, \dots, k}$. The derivative \begin{equation} \partial_{i_1} \partial_{i_2} \cdots \partial_{i_k} f \end{equation} is called a $k$-th order partial derivative of $f$.

Definition 2. A function is said to be of class $C^k$ if all its $k^\text{th}$-order partial derivatives exist, and are continuous.

Theorem 3 (Clairaut). Let $a \in \R^d$, $U \subseteq \R^d$ be open and $f \in C^2(U)$. Then for all $i, j \in \set{1, \dots, d}$, $\partial_i \partial_j f = \partial_j \partial_i f$.

If the mixed second order partials are only assumed to exist, but not required to be continuous, then they need not be equal!

Problem 4. Let $f(x, y) = x^3 y / (x^2 + y^2)$ for $(x, y) \neq 0$ and $f(0, 0) = 0$. Then $\partial_x \partial_y f(0, 0) = 1$ but $\partial_y \partial_x f(0, 0) = 0$. [Note: Be sure you compute $\partial_x \partial_x f(0, 0)$, and not $\lim_{x \to 0} \partial_x \partial_y f(x, 0)$, and similarly for $\partial_y \partial_x f(0,0)$.]

Proof sketch of Theorem 3. Here’s the idea in 2D (the same works in higher dimensions). For simplicity assume $a = 0$, use the mean value theorem to show \begin{equation} \partial_1 \partial_2 f(0,0) = \lim_{h \to 0} \frac{f(h_1, h_2) - f(h_1, 0) - f(0, h_2) + f(0, 0)}{h_1 h_2} = \partial_2 \partial_1 f(0,0). \end{equation}

Problem 5. Given $u\colon \R^3 \to \R^3$ define the divergence and curl of $u$ by \begin{equation} \dv u \defeq \sum_{i=1}^3 \partial_i u_i \qquad\text{and}\qquad \curl u \defeq \begin{pmatrix} \partial_2 u_3 - \partial_3 u_2\\ \partial_3 u_1 - \partial_1 u_3\\ \partial_1 u_2 - \partial_2 u_1 \end{pmatrix}\,, \end{equation} respectively. (Note the gradient $\grad u$ is $(Du)^T$, and is notationally different from the divergence $\dv u$ because of the missing dot.)

Let $f\colon \R^3 \to \R$ be a $C^2$. We can combine the divergence, gradient and curl to form a few second order operators. For instance, $\dv (\grad f) = \trace Hf$ is known as the Laplacian of $f$, denoted by $\lap f$. Which of the $9$ second order combinations of divergence, gradient and curl make sense? Of the combinations that make sense, exactly one must always be $0$. Which one?
Let $u \colon \R^3 \to \R^3$ be $C^2$. Which of the $9$ second order combinations of divergence, gradient and curl make sense? Of the combinations that make sense, exactly one must always be $0$. Which one?
Let $u\colon \R^3 \to \R^3$ be a $C^2$ function. Show that $\curl \curl u = - \lap u + \grad \dv u$. Here $\lap u$ is called the Laplacian of $u$, and defined to be the column vector $(\dv \grad u_1, \dv \grad u_2, \dv \grad u_3)^T$.

Note: For a scalar function $f: \R^3 \to \R$, the Laplacian of $f$ (denoted by the same symbol $\lap f$) is defined to be $\dv \grad f$.

Taylor’s theorem

Theorem 6 (Mean value theorem). If $f$ is differentiable on the entire line joining $a$ and $b$, \begin{equation*} f(b) = f(a) + (b - a) \cdot \grad f(\xi) \end{equation*} for some point $\xi$ on the line segment joining $a$ and $b$.

Proof sketch. Let $g(t) = f(a + t(b-a))$ and use the Lagrange Mean Value Theorem.

Definition 7. Let $\alpha = ( \alpha_1, \alpha_2, \dots, \alpha_d)$, with $\alpha_i \in \N \cup \set{0}$. If $h \in \R^d$ define \begin{equation} h^\alpha = h_1^{\alpha_1} h_2^{\alpha_2} \cdots h_d^{\alpha_d}, \quad \abs{\alpha} = \alpha_1 + \cdots + \alpha_d, \quad\text{and}\quad \alpha! = \alpha_1! \, \alpha_2!\, \cdots \alpha_d!. \end{equation} Given a $C^{\abs{\alpha}}$ function $f$, define \begin{equation} D^\alpha f = \partial_1^{\alpha_1} \partial_2^{\alpha_2} \cdots \partial_d^{\alpha_d} f, \end{equation} with the convention that $\partial_i^0 f = f$.

Theorem 8 (Taylor’s theorem). If $f$ is a $C^n$ function on $\R^d$ and $a \in \R^d$ we have \begin{equation} f(a + h) = \sum_{\abs{\alpha} < n} \frac{1}{\alpha!} D^\alpha f(a) h^\alpha + R_n(h), \end{equation} for some function $R_n$ such that \begin{equation} \lim_{h \to 0} \frac{R_n(h)}{\abs{h}^n} = 0. \end{equation}

Proof sketch. Let $g(t) = f(a + th)$, and use the one dimensional Taylor’s theorem, and collect terms using Clairaut’s theorem.

Local extrema

Necessary criterion

Definition 9. Let $U \subseteq \R^d$ be open and $f \colon U \to \R$ be a function.

We say $f$ attains a local maximum at $a$ if there exists $r > 0$ such that $f(a) \geq f(x)$ for all $x \in B(a, r)$.
We say $f$ attains a local minimum at $a$ if there exists $r > 0$ such that $f(a) \leq f(x)$ for all $x \in B(a, r)$.
We say $f$ has a local extremum at $a$ if $f$ has either a local maximum at $a$ or a local minimum at $a$.

Proposition 10. Let $U \subseteq \R^d$ be open and $f \colon U \to \R$ be a function, and $a \in U$. If $f$ attains a local extremum at $a$, and is differentiable at $a$, then $\grad f(a) = 0$.

Definition 11 (Critical point). Any point where $\grad f = 0$ is called a critical point of the function $f$.

Definition 12 (Hessian). If $f \in C^2$, the Hessian of $f$, is defined to be the matrix \begin{equation} Hf = \begin{pmatrix} \partial_1 \partial_1 f & \partial_2 \partial_1 f & \cdots & \partial_d \partial_1 f\\ \partial_1 \partial_2 f & \partial_2 \partial_2 f & \cdots & \partial_d \partial_2 f\\ \vdots & \vdots & & \vdots\\ \partial_1 \partial_d f & \partial_2 \partial_d f & \cdots & \partial_d \partial_d f \end{pmatrix} \end{equation}

Remark 13. Note if $f \in C^2$, $Hf$ is symmetric.

Proposition 14. Let $f \colon U\to \R$ be $C^2$, $a \in U$.

If $f$ attains a local maximum at $a$, then $Hf_a$ is negative semi-definite.
If $f$ attains a local minimum at $a$, then $Hf_a$ is positive semi-definite.

As a quick reminder, here are a few basics about semi-definite matrices.

Definition 15. Let $A$ be a $d \times d$ symmetric matrix.

If $(Av) \cdot v \leq 0$ for all $v \in \R^d$, then $A$ is called negative semi-definite.
If $(Av) \cdot v < 0$ for all $v \in \R^d - \set{0}$, then $A$ is called negative definite.
If $(Av) \cdot v \geq 0$ for all $v \in \R^d$, then $A$ is called positive semi-definite.
If $(Av) \cdot v > 0$ for all $v \in \R^d - \set{0}$, then $A$ is called positive definite.

Proposition 16. A symmetric matrix is positive semi-definite if all the eigenvalues are non-negative.

Proposition 17. Let $A$ be the symmetric $2 \times 2$ matrix $(\begin{smallmatrix} a & b\\ b & c\end{smallmatrix})$.

$A$ is positive definite if and only if $a > 0$ and $ac - b^2 > 0$.
$A$ is negative definite if and only if $a < 0$ and $ac - b^2 > 0$.
$A$ is positive semi-definite if and only if $a, c \geq 0$ and $ac - b^2 \geq 0$.
$A$ is negative semi-definite if and only if $a, c \leq 0$ and $ac - b^2 \geq 0$.

Problem 18. Suppose $f\colon \R^d \to \R$ is $C^2$. True or false: $f$ is convex if and only if the Hessian $Hf$ is always positive semi-definite? Prove it, or find a counter example.

Sufficient criterion

Theorem 19. Let $U \subseteq \R^d$ be open, $a \in U$, $f\in C^2(U)$.

If $Df_a = 0$ and further $Hf_a$ is positive definite, then $f$ attains a local minimum at $a$.
If $Df_a = 0$ and further $Hf_a$ is negative definite, then $f$ attains a local maximum at $a$.

To prove this first recall the following fact from linear algebra.

Lemma 20. If $A$ is a $d \times d$ positive definite symmetric matrix, then for every $v \in \R^d$ we have $(Av) \cdot v \geq \lambda_0 \abs{v}^2$, where $\lambda_0 > 0$ is the smallest eigenvalue of $A$.

Proof sketch of Theorem 19. Use Taylor’s theorem and Lemma 20.

Saddles

Definition 21. We say $a$ is a local saddle of $f$ if there exist two linearly independent vectors $v_1$ and $v_2$ such that $f$ has a strict local minimum in direction $v_1$ and a strict local maximum in direction $v_2$.

Proposition 22. If $f$ is $C^2$, $Df_a = 0$ and $Hf_a$ has at least one strictly positive and one strictly negative eigenvalue, then $a$ is a local saddle of $f$.

Example 23. The function $\abs{x}^2$ has a local minimum at $0$. The function $-\abs{x}^2$ has a local maximum at $0$. The function $x_1^2 - x_2^2$ has a saddle at $0$.

Problem 24. Let $a, b, c \in \R$ be such that $ac - b^2 \neq 0$. Find all critical points of $ax^2 + 2bxy + cy^2$. Find conditions on $a, b, c$ that would classify this as a local minimum, maximum or saddle.

Problem 25. Find the critical points of each of these functions. For each critical point, determine whether it is a local maximum, local minimum, saddle or neither.

$\frac{x}{x^2 + y^2}$
$[x^2 + (y+1)^2 ][ x^2 + (y - 1)^2]$
$\sin x \cosh y$
$x^2 - 2xy + y^2$