Hessian matrix - AbsoluteAstronomy.com

Mathematics

Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial derivative

Partial derivative

In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant...

s of a function

Function (mathematics)

In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German

Germany

Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...

mathematician Ludwig Otto Hesse and later named after him. Hesse himself had used the term "functional determinants".

Given the real

Real number

In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

-valued function

if all second partial derivative

Partial derivative

In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant...

s of f exist, then the Hessian matrix of f is the matrix

where x = (x₁, x₂, ..., x_n) and D_i is the differentiation operator with respect to the ith argument and the Hessian becomes

Because f is often clear from context,

is frequently shortened to simply

.

Some mathematicians define the Hessian as the determinant

Determinant

In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...

of the above matrix.

Hessian matrices are used in large-scale optimization

Optimization (mathematics)

In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

problems within Newton

Newton's method in optimization

In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...

-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is,

where J is the Jacobian matrix, which is a vector (the gradient

Gradient

In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....

for scalar-valued functions). The full Hessian matrix can be difficult to compute in practice; in such situations, quasi-Newton

Quasi-Newton method

In optimization, quasi-Newton methods are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on...

algorithms have been developed that use approximations to the Hessian. The best-known quasi-Newton algorithm is the BFGS algorithm.

Mixed derivatives and symmetry of the Hessian

The mixed derivatives of f are the entries off the main diagonal in the Hessian. Assuming that they are continuous, the order of differentiation does not matter (Clairaut's theorem). For example,

This can also be written as:

In a formal statement: if the second derivatives of f are all continuous

Continuous function

In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...

in a neighborhood

Neighbourhood (mathematics)

In topology and related areas of mathematics, a neighbourhood is one of the basic concepts in a topological space. Intuitively speaking, a neighbourhood of a point is a set containing the point where you can move that point some amount without leaving the set.This concept is closely related to the...

D, then the Hessian of f is a symmetric matrix throughout D; see symmetry of second derivatives

Symmetry of second derivatives

In mathematics, the symmetry of second derivatives refers to the possibility of interchanging the order of taking partial derivatives of a functionfof n variables...

Critical points and discriminant

If the gradient

Gradient

of f (i.e., its derivative in the vector sense) is zero at some point x, then f has a critical point
Critical point (mathematics)
In calculus, a critical point of a function of a real variable is any value in the domain where either the function is not differentiable or its derivative is 0. The value of the function at a critical point is a critical value of the function...

(or stationary point
Stationary point
In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....

) at x. The determinant

Determinant

of the Hessian at x is then called the discriminant

Discriminant

In algebra, the discriminant of a polynomial is an expression which gives information about the nature of the polynomial's roots. For example, the discriminant of the quadratic polynomialax^2+bx+c\,is\Delta = \,b^2-4ac....

. If this determinant is zero then x is called a degenerate critical point of f, this is also called a non-Morse critical point of f. Otherwise it is non-degenerate, this is called a Morse critical point of f.

Second derivative test

The following test can be applied at a non-degenerate critical point x. If the Hessian is positive definite

Positive-definite matrix

In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form ....

at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point

Saddle point

In mathematics, a saddle point is a point in the domain of a function that is a stationary point but not a local extremum. The name derives from the fact that in two dimensions the surface resembles a saddle that curves up in one direction, and curves down in a different direction...

for f (this is true even if x is degenerate). Otherwise the test is inconclusive.

Note that for positive semidefinite and negative semidefinite Hessians the test is inconclusive (yet a conclusion can be made that f is locally convex

Convex function

In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...

or concave

Concave function

In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...

respectively). However, more can be said from the point of view of Morse theory

Morse theory

In differential topology, the techniques of Morse theory give a very direct way of analyzing the topology of a manifold by studying differentiable functions on that manifold. According to the basic insights of Marston Morse, a differentiable function on a manifold will, in a typical case, reflect...

.

In view of what has just been said, the second derivative test

Second derivative test

In calculus, the second derivative test is a criterion often useful for determining whether a given stationary point of a function is a local maximum or a local minimum using the value of the second derivative at the point....

for functions of one and two variables is simple. In one variable, the Hessian contains just one second derivative; if it is positive then x is a local minimum, if it is negative then x is a local maximum; if it is zero then the test is inconclusive. In two variables, the determinant

Determinant

can be used, because the determinant is the product of the eigenvalues. If it is positive then the eigenvalues are both positive, or both negative. If it is negative then the two eigenvalues have different signs. If it is zero, then the second derivative test is inconclusive.

Bordered Hessian

A bordered Hessian is used for the second-derivative test in certain constrained optimization problems. Given the function as before:

but adding a constraint function such that:

the bordered Hessian appears as

If there are, say, m constraints then the zero in the north-west corner is an m × m block of zeroes, and there are m border rows at the top and m border columns at the left.

The above rules of positive definite and negative definite can not apply here since a bordered Hessian can not be definite: we have z'Hz = 0 if vector z has a non-zero as its first element, followed by zeroes.

The second derivative test consists here of sign restrictions of the determinants of a certain set of n - m submatrices of the bordered Hessian. Intuitively, think of the m constraints as reducing the problem to one with n - m free variables. (For example, the maximization of

subject to the constraint

can be reduced to the maximization of

without constraint.)

Vector-valued functions

If f is instead a function from

, i.e.

then the array of second partial derivatives is not a two-dimensional matrix of size

, but rather a tensor

Tensor

Tensors are geometric objects that describe linear relations between vectors, scalars, and other tensors. Elementary examples include the dot product, the cross product, and linear maps. Vectors and scalars themselves are also tensors. A tensor can be represented as a multi-dimensional array of...

of order 3. This can be thought of as a multi-dimensional array with dimensions

, which degenerates to the usual Hessian matrix for

Generalizations to Riemannian manifolds

Let

be a Riemannian manifold

Riemannian manifold

In Riemannian geometry and the differential geometry of surfaces, a Riemannian manifold or Riemannian space is a real differentiable manifold M in which each tangent space is equipped with an inner product g, a Riemannian metric, which varies smoothly from point to point...

and

its Levi-Civita connection

Levi-Civita connection

In Riemannian geometry, the Levi-Civita connection is a specific connection on the tangent bundle of a manifold. More specifically, it is the torsion-free metric connection, i.e., the torsion-free connection on the tangent bundle preserving a given Riemannian metric.The fundamental theorem of...

. Let

be a smooth function. We may define the Hessian tensor

,

where we have taken advantage of the first covariant derivative of a function being the same as ordinary derivative. Choosing local coordinates

we obtain the local expression for the Hessian as

where

are the Christoffel symbols

Christoffel symbols

In mathematics and physics, the Christoffel symbols, named for Elwin Bruno Christoffel , are numerical arrays of real numbers that describe, in coordinates, the effects of parallel transport in curved surfaces and, more generally, manifolds. As such, they are coordinate-space expressions for the...

of the connection. Other equivalent forms for the Hessian are given by

and

Mixed derivatives and symmetry of the Hessian

Critical points and discriminant

Second derivative test

Bordered Hessian

Vector-valued functions

Generalizations to Riemannian manifolds

See also