BFGS method - AbsoluteAstronomy.com

Numerical analysis

Numerical analysis is the study of algorithms that use numerical approximation for the problems of mathematical analysis ....

optimization

Optimization (mathematics)

In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method is a method

Iterative method

In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method...

for solving nonlinear optimization problems (which lack constraints).

The BFGS method approximates

Approximation theory

In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby...

Newton's method

Newton's method in optimization

In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...

, a class of hill-climbing optimization

Hill climbing

In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution...

techniques that seeks a stationary point

Stationary point

In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....

of a (preferably twice continuously differentiable) function: For such problems, a necessary condition for optimality is that the gradient

Gradient

In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....

be zero.
Newton's method and the BFGS methods need not converge unless the function has a quadratic Taylor expansion near an optimum. These methods use the first and second derivatives. However, BFGS has proven good performance even for non-smooth optimizations.

In quasi-Newton methods, the Hessian matrix

Hessian matrix

In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

of second derivative

Derivative

In calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...

s need not be evaluated directly. Instead, the Hessian matrix is approximated using rank-one updates specified by gradient evaluations (or approximate gradient evaluations). Quasi-Newton methods are a generalization of the secant method

Secant method

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite difference approximation of Newton's method. However, the method was developed...

to find the root of the first derivative
for multidimensional problems. In multi-dimensions the secant equation does not specify a unique solution, and quasi-Newton methods differ in how they constrain the solution. The BFGS method is one of the most popular members of this class. Also in common use is L-BFGS

L-BFGS

The limited-memory BFGS algorithm is a member of the broad family of quasi-Newton optimization methods that uses a limited memory variation of the Broyden–Fletcher–Goldfarb–Shanno update to approximate the inverse Hessian matrix...

, which is a limited-memory version of BFGS that is particularly suited to problems with very large numbers of variables (like >1000). The BFGS-B variant handles simple box constraints.

Rationale

The search direction p_k at stage k is given by the solution of the analogue of the Newton equation

where

is an approximation to the Hessian matrix

Hessian matrix

which is updated iteratively at each stage, and

is the gradient of the function evaluated at x_k. A line search in the direction p_k is then used to find the next point x_k+1. Instead of requiring the full Hessian matrix at the point x_k+1 to be computed as B_k+1, the approximate Hessian at stage k is updated by the addition of two matrices.

Both U_k and V_k are symmetric rank-one matrices but have different bases. The symmetric rank one assumption here means that we may write

So equivalently, U_k and V_k construct a rank-two update matrix which is robust against the scale problem often suffered in the gradient descent

Gradient descent

Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...

searching (e.g., in Broyden's method

Broyden's method

In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....

).

The quasi-Newton condition imposed on this update is

Algorithm

From an initial guess

and an approximate Hessian matrix

the following steps are repeated until

converges to the solution.

Obtain a direction by solving:
Perform a line search to find an acceptable stepsize in the direction found in the first step, then update
Set

denotes the objective function to be minimized. Convergence can be checked by observing the norm of the gradient,

. Practically,

can be initialized with

, so that the first step will be equivalent to a gradient descent

Gradient descent

, but further steps are more and more refined by

, the approximation to the Hessian.

The first step of the algorithm is carried out using the inverse of the matrix

, which is usually obtained efficiently by applying the Sherman–Morrison formula

Sherman–Morrison formula

In mathematics, in particular linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of the sum of an invertiblematrix Aand the dyadic product, u v^T,of a column vector u and a row vector v^T...

to the fifth line of the algorithm, giving

In statistical estimation problems (such as maximum likelihood or Bayesian inference), credible interval

Credible interval

In Bayesian statistics, a credible interval is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region...

s or confidence interval

Confidence interval

In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s for the solution can be obtained from the inverse of the final Hessian matrix.

Implementations

In the Matlab

MATLAB

MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

Optimization Toolbox, the fminunc function uses BFGS with cubic line search when the problem size is set to "medium scale."
The GSL

GNU Scientific Library

In computing, the GNU Scientific Library is a software library written in the C programming language for numerical calculations in applied mathematics and science...

implements BFGS as gsl_multimin_fdfminimizer_vector_bfgs2.
In SciPy

SciPy

SciPy is an open source library of algorithms and mathematical tools for the Python programming language.SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and...

, the scipy.optimize.fmin_bfgs function implements BFGS.
It is also possible to run BFGS using any of the L-BFGS

L-BFGS

algorithms by setting the parameter L to a very large number.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.