Hessian matrix
Encyclopedia
In mathematics
, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial derivative
s of a function
; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German
mathematician Ludwig Otto Hesse and later named after him. Hesse himself had used the term "functional determinants".
Given the real
-valued function
if all second partial derivative
s of f exist, then the Hessian matrix of f is the matrix
where x = (x1, x2, ..., xn) and Di is the differentiation operator with respect to the ith argument and the Hessian becomes
Because f is often clear from context, is frequently shortened to simply .
Some mathematicians define the Hessian as the determinant
of the above matrix.
Hessian matrices are used in large-scale optimization
problems within Newton
-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is,
where J is the Jacobian matrix, which is a vector (the gradient
for scalar-valued functions). The full Hessian matrix can be difficult to compute in practice; in such situations, quasi-Newton
algorithms have been developed that use approximations to the Hessian. The best-known quasi-Newton algorithm is the BFGS algorithm.
This can also be written as:
In a formal statement: if the second derivatives of f are all continuous
in a neighborhood
D, then the Hessian of f is a symmetric matrix throughout D; see symmetry of second derivatives
.
of f (i.e., its derivative in the vector sense) is zero at some point x, then f has a critical point
(or stationary point
) at x. The determinant
of the Hessian at x is then called the discriminant
. If this determinant is zero then x is called a degenerate critical point of f, this is also called a non-Morse critical point of f. Otherwise it is non-degenerate, this is called a Morse critical point of f.
at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point
for f (this is true even if x is degenerate). Otherwise the test is inconclusive.
Note that for positive semidefinite and negative semidefinite Hessians the test is inconclusive (yet a conclusion can be made that f is locally convex
or concave
respectively). However, more can be said from the point of view of Morse theory
.
In view of what has just been said, the second derivative test
for functions of one and two variables is simple. In one variable, the Hessian contains just one second derivative; if it is positive then x is a local minimum, if it is negative then x is a local maximum; if it is zero then the test is inconclusive. In two variables, the determinant
can be used, because the determinant is the product of the eigenvalues. If it is positive then the eigenvalues are both positive, or both negative. If it is negative then the two eigenvalues have different signs. If it is zero, then the second derivative test is inconclusive.
but adding a constraint function such that:
the bordered Hessian appears as
If there are, say, m constraints then the zero in the north-west corner is an m × m block of zeroes, and there are m border rows at the top and m border columns at the left.
The above rules of positive definite and negative definite can not apply here since a bordered Hessian can not be definite: we have z'Hz = 0 if vector z has a non-zero as its first element, followed by zeroes.
The second derivative test consists here of sign restrictions of the determinants of a certain set of n - m submatrices of the bordered Hessian. Intuitively, think of the m constraints as reducing the problem to one with n - m free variables. (For example, the maximization of subject to the constraint can be reduced to the maximization of without constraint.)
then the array of second partial derivatives is not a two-dimensional matrix of size , but rather a tensor
of order 3. This can be thought of as a multi-dimensional array with dimensions , which degenerates to the usual Hessian matrix for .
and its Levi-Civita connection
. Let be a smooth function. We may define the Hessian tensor by ,
where we have taken advantage of the first covariant derivative of a function being the same as ordinary derivative. Choosing local coordinates we obtain the local expression for the Hessian as
where are the Christoffel symbols
of the connection. Other equivalent forms for the Hessian are given by and .
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
, the Hessian matrix (or simply the Hessian) is the square matrix of second-order partial derivative
Partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant...
s of a function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
mathematician Ludwig Otto Hesse and later named after him. Hesse himself had used the term "functional determinants".
Given the real
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
-valued function
if all second partial derivative
Partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant...
s of f exist, then the Hessian matrix of f is the matrix
where x = (x1, x2, ..., xn) and Di is the differentiation operator with respect to the ith argument and the Hessian becomes
Because f is often clear from context, is frequently shortened to simply .
Some mathematicians define the Hessian as the determinant
Determinant
In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...
of the above matrix.
Hessian matrices are used in large-scale optimization
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
problems within Newton
Newton's method in optimization
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...
-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is,
where J is the Jacobian matrix, which is a vector (the gradient
Gradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
for scalar-valued functions). The full Hessian matrix can be difficult to compute in practice; in such situations, quasi-Newton
Quasi-Newton method
In optimization, quasi-Newton methods are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on...
algorithms have been developed that use approximations to the Hessian. The best-known quasi-Newton algorithm is the BFGS algorithm.
Mixed derivatives and symmetry of the Hessian
The mixed derivatives of f are the entries off the main diagonal in the Hessian. Assuming that they are continuous, the order of differentiation does not matter (Clairaut's theorem). For example,This can also be written as:
In a formal statement: if the second derivatives of f are all continuous
Continuous function
In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...
in a neighborhood
Neighbourhood (mathematics)
In topology and related areas of mathematics, a neighbourhood is one of the basic concepts in a topological space. Intuitively speaking, a neighbourhood of a point is a set containing the point where you can move that point some amount without leaving the set.This concept is closely related to the...
D, then the Hessian of f is a symmetric matrix throughout D; see symmetry of second derivatives
Symmetry of second derivatives
In mathematics, the symmetry of second derivatives refers to the possibility of interchanging the order of taking partial derivatives of a functionfof n variables...
.
Critical points and discriminant
If the gradientGradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
of f (i.e., its derivative in the vector sense) is zero at some point x, then f has a critical point
Critical point (mathematics)
In calculus, a critical point of a function of a real variable is any value in the domain where either the function is not differentiable or its derivative is 0. The value of the function at a critical point is a critical value of the function...
(or stationary point
Stationary point
In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....
) at x. The determinant
Determinant
In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...
of the Hessian at x is then called the discriminant
Discriminant
In algebra, the discriminant of a polynomial is an expression which gives information about the nature of the polynomial's roots. For example, the discriminant of the quadratic polynomialax^2+bx+c\,is\Delta = \,b^2-4ac....
. If this determinant is zero then x is called a degenerate critical point of f, this is also called a non-Morse critical point of f. Otherwise it is non-degenerate, this is called a Morse critical point of f.
Second derivative test
The following test can be applied at a non-degenerate critical point x. If the Hessian is positive definitePositive-definite matrix
In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form ....
at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point
Saddle point
In mathematics, a saddle point is a point in the domain of a function that is a stationary point but not a local extremum. The name derives from the fact that in two dimensions the surface resembles a saddle that curves up in one direction, and curves down in a different direction...
for f (this is true even if x is degenerate). Otherwise the test is inconclusive.
Note that for positive semidefinite and negative semidefinite Hessians the test is inconclusive (yet a conclusion can be made that f is locally convex
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
or concave
Concave function
In mathematics, a concave function is the negative of a convex function. A concave function is also synonymously called concave downwards, concave down, convex upwards, convex cap or upper convex.-Definition:...
respectively). However, more can be said from the point of view of Morse theory
Morse theory
In differential topology, the techniques of Morse theory give a very direct way of analyzing the topology of a manifold by studying differentiable functions on that manifold. According to the basic insights of Marston Morse, a differentiable function on a manifold will, in a typical case, reflect...
.
In view of what has just been said, the second derivative test
Second derivative test
In calculus, the second derivative test is a criterion often useful for determining whether a given stationary point of a function is a local maximum or a local minimum using the value of the second derivative at the point....
for functions of one and two variables is simple. In one variable, the Hessian contains just one second derivative; if it is positive then x is a local minimum, if it is negative then x is a local maximum; if it is zero then the test is inconclusive. In two variables, the determinant
Determinant
In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...
can be used, because the determinant is the product of the eigenvalues. If it is positive then the eigenvalues are both positive, or both negative. If it is negative then the two eigenvalues have different signs. If it is zero, then the second derivative test is inconclusive.
Bordered Hessian
A bordered Hessian is used for the second-derivative test in certain constrained optimization problems. Given the function as before:but adding a constraint function such that:
the bordered Hessian appears as
If there are, say, m constraints then the zero in the north-west corner is an m × m block of zeroes, and there are m border rows at the top and m border columns at the left.
The above rules of positive definite and negative definite can not apply here since a bordered Hessian can not be definite: we have z'Hz = 0 if vector z has a non-zero as its first element, followed by zeroes.
The second derivative test consists here of sign restrictions of the determinants of a certain set of n - m submatrices of the bordered Hessian. Intuitively, think of the m constraints as reducing the problem to one with n - m free variables. (For example, the maximization of subject to the constraint can be reduced to the maximization of without constraint.)
Vector-valued functions
If f is instead a function from , i.e.then the array of second partial derivatives is not a two-dimensional matrix of size , but rather a tensor
Tensor
Tensors are geometric objects that describe linear relations between vectors, scalars, and other tensors. Elementary examples include the dot product, the cross product, and linear maps. Vectors and scalars themselves are also tensors. A tensor can be represented as a multi-dimensional array of...
of order 3. This can be thought of as a multi-dimensional array with dimensions , which degenerates to the usual Hessian matrix for .
Generalizations to Riemannian manifolds
Let be a Riemannian manifoldRiemannian manifold
In Riemannian geometry and the differential geometry of surfaces, a Riemannian manifold or Riemannian space is a real differentiable manifold M in which each tangent space is equipped with an inner product g, a Riemannian metric, which varies smoothly from point to point...
and its Levi-Civita connection
Levi-Civita connection
In Riemannian geometry, the Levi-Civita connection is a specific connection on the tangent bundle of a manifold. More specifically, it is the torsion-free metric connection, i.e., the torsion-free connection on the tangent bundle preserving a given Riemannian metric.The fundamental theorem of...
. Let be a smooth function. We may define the Hessian tensor by ,
where we have taken advantage of the first covariant derivative of a function being the same as ordinary derivative. Choosing local coordinates we obtain the local expression for the Hessian as
where are the Christoffel symbols
Christoffel symbols
In mathematics and physics, the Christoffel symbols, named for Elwin Bruno Christoffel , are numerical arrays of real numbers that describe, in coordinates, the effects of parallel transport in curved surfaces and, more generally, manifolds. As such, they are coordinate-space expressions for the...
of the connection. Other equivalent forms for the Hessian are given by and .
See also
- The determinant of the Hessian matrix is a covariant; see Invariant of a binary formInvariant of a binary formIn mathematical invariant theory, an invariant of a binary form is a polynomial in the coefficients of a binary form in two variables x and y that remains invariant under unimodular transformations of the variables x and y.-Terminology:...
- Jacobian matrix
- The Hessian matrix is commonly used for expressing image processing operators in image processingImage processingIn electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...
and computer visionComputer visionComputer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
(see the Laplacian of Gaussian (LoG) blob detector, the determinant of Hessian (DoH) blob detector and scale spaceScale spaceScale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision...
).