Condition Number of the Hessian

Next: Gradient Calculation: Constrained Optimization Up: Review of The Basics: Previous: Constrained Optimization

Condition Number of the Hessian

The convergence rate for the steepest descent method is connected to the eigenvalues of the Hessian, in particular, to the ratio of the smallest eigenvalue to the largest one. We discuss this in some details in this section. A basic gradient descent method has the form

$\displaystyle \alpha \leftarrow \alpha - \delta \nabla E$

(19)

where $\delta$ is a step size, whose magnitude is determined using a line search. At the a vicinity of a minimum $\alpha = \alpha ^*$ , since $\nabla E (\alpha ^*) = 0$ , we have

$\displaystyle E(\alpha, U(\alpha)) = E(\alpha^*, U(\alpha^*)) + \frac{1}{2} \tilde \alpha^T{\cal H} \tilde \alpha + O ( \Vert \tilde \alpha \Vert^3),$

(20)

where $\tilde \alpha = \alpha-\alpha^*$ . From this we see that the gradient in the vicinity of the minimum can be expressed as

$\displaystyle (\nabla E)(\alpha ) = {\cal H} \tilde \alpha.$

(21)

Substituting the last equality into (19) and subtracting $\alpha ^*$ from both sides we get,

$\displaystyle \tilde \alpha \leftarrow ( I - \delta {\cal H} ) \tilde \alpha.$

(22)

This relation expresses the new errors ( $\tilde \alpha$ on left) as a function of the old errors ( $\tilde \alpha$ on right). Convergence rate depends on

$\displaystyle \Vert I - \delta {\cal H} \Vert.$

(23)

Now we want to relate the difficulty in solving an optimization problem using the steepest descent method to the condition number of the Hessian. The Hessian is a symmetric matrix and it is also positive definite (if indeed we have a minimum). Let its eigenvalues be $\mu _j$ with eigenvectors , i.e.,

$\displaystyle H v_j = \mu _j v_j$

(24)

and assume that $0<\mu _1 \leq \mu _2 \leq \dots \leq \mu _q$ . The iteration matrix was shown to be $I - \delta {\cal H}$ and its eigenvalues are $1 - \delta \mu _j$ . For convergence we need

$\displaystyle \max _j \vert 1 - \delta \mu _j \vert < 1$

(25)

which implies $\delta < \frac{2}{\mu _q}$ . Taking $\delta = \frac{c}{\mu _q}$ , with

gives

$\displaystyle \max _j \vert 1 - \delta \mu _j \vert = \vert 1 - c \frac{\mu _1}{\mu _q} \vert.$

(26)

Thus, the convergence rate depends on the ratio of the smallest to the largest eigenvalue of the Hessian. When dealing with symmetric positive matrices this is the condition number of the matrix.

The structure of the minimum is essentially determined by ${\cal H}$ and its analysis in the context of fluid dynamics equation will be demonstrated later. It plays a major role in the optimization problem and its solution processes.

Several approaches for calculating gradients of subject to the constraints exist, and we discuss some of them.

Next: Gradient Calculation: Constrained Optimization Up: Review of The Basics: Previous: Constrained Optimization

Shlomo Ta'asan 2001-08-22