Quasi-Newton method
Encyclopedia
In optimization
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

, quasi-Newton methods (also known as variable metric methods) are algorithms for finding local maxima and minima
Maxima and minima
In mathematics, the maximum and minimum of a function, known collectively as extrema , are the largest and smallest value that the function takes at a point either within a given neighborhood or on the function domain in its entirety .More generally, the...

 of functions
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

. Quasi-Newton methods are based on
Newton's method
Newton's method in optimization
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...

 to find the stationary point
Stationary point
In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....

 of a function, where the gradient
Gradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....

 is 0. Newton's method assumes that the function can be locally approximated as a quadratic
Quadratic function
A quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the y-axis....

 in the region around the optimum, and use the first and second derivatives (gradient and Hessian) to find the stationary point.

In Quasi-Newton methods the Hessian matrix
Hessian matrix
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

 of second derivative
Derivative
In calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...

s of the function to be minimized
does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead.
Quasi-Newton methods are a generalization of the secant method
Secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite difference approximation of Newton's method. However, the method was developed...

 to find the root of the first derivative
for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods
differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.

The first quasi-Newton algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory
Argonne National Laboratory
Argonne National Laboratory is the first science and engineering research national laboratory in the United States, receiving this designation on July 1, 1946. It is the largest national laboratory by size and scope in the Midwest...

. He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula
SR1 formula
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative based on the derivatives calculated at two points...

 (for symmetric rank one), the BHHH
BHHH
BHHH is an optimization algorithm in econometrics similar to Gauss–Newton algorithm. It is an acronym of the four originators: Berndt, B. Hall, R. Hall, and Jerry Hausman.-Usage:...

 method, the widespread BFGS method
BFGS method
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno method is a method for solving nonlinear optimization problems ....

 (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension, L-BFGS
L-BFGS
The limited-memory BFGS algorithm is a member of the broad family of quasi-Newton optimization methods that uses a limited memory variation of the Broyden–Fletcher–Goldfarb–Shanno update to approximate the inverse Hessian matrix...

. The Broyden's class is a linear combination of the DFP and BFGS methods.

The SR1 formula does not guarantee the update matrix to maintain positive-definiteness
Positive-definite matrix
In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form ....

 and can be used for indefinite problems.
The Broyden's method
Broyden's method
In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....

 does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient)
by updating the Jacobian (rather than the Hessian).

Description of the method

As in Newton's method
Newton's method in optimization
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...

, one uses a second order approximation to find the minimum of a function .
The Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....

 of around an iterate is:

where () is the gradient
Gradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....

 and an approximation to the Hessian matrix
Hessian matrix
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

.
The gradient of this approximation (with respect to ) is


and setting this gradient to zero provides the Newton step:


The Hessian approximation is chosen to satisfy


which is called the secant equation(the Taylor series of the gradient itself). In more than one dimension is under determined. In one dimension, solving for and applying the Newton's step with the updated value is equivalent to the secant method
Secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite difference approximation of Newton's method. However, the method was developed...

.
Various methods are used to find the solution to the secant equation that is symmetric ()
and closest to the current approximate value according to some metric .
An approximate initial value of is often sufficient to achieve rapid convergence. The unknown is updated applying the Newton's step calculated using the current approximate Hessian matrix
  • , with chosen to satisfy the Wolfe conditions
    Wolfe conditions
    In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods.In these methods the idea is to find\min_x f...

    ;
  • ;
  • The gradient computed at the new point , and



is used to update the approximate Hessian , or directly its inverse using the Sherman-Morrison formula
Sherman-Morrison formula
In mathematics, in particular linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of the sum of an invertiblematrix Aand the dyadic product, u v^T,of a column vector u and a row vector v^T...

.
  • A key property of the BFGS and DFP updates is that if is positive definite and is chosen to satisfy the Wolfe conditions then is also positive definite.


The most popular update formulas are:
Method
DFP
BFGS
BFGS method
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno method is a method for solving nonlinear optimization problems ....

Broyden
Broyden's method
In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....

Broyden family
SR1
SR1 formula
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative based on the derivatives calculated at two points...


Example algorithm using matlab



%***********************************************************************%
% Usage: [x,Iter,FunEval,EF] = Quasi_Newton (fun,x0,MaxIter,epsg,epsx)
% fun: name of the multidimensional scalar objective function
% (string). This function takes a vector argument of length
% n and returns a scalar.
% x0: starting point (row vector of length n).
% MaxIter: maximum number of iterations to find a solution.
% epsg: maximum acceptable Euclidean norm of the gradient of the
% objective function at the solution found.
% epsx: minimum relative change in the optimization variables x.
% x: solution found (row vector of length n).
% Iter: number of iterations needed to find the solution.
% FunEval: number of function evaluations needed.
% EF: exit flag,
% EF=1: successful optimization (gradient is small enough).
% EF=2: algorithm converged (relative change in x is small
% enough).
% EF=-1: maximum number of iterations exceeded.

% C) Quasi-Newton optimization algorithm using (BFGS) %

function [x,i,FunEval,EF]= Quasi_Newton (fun, x0, MaxIter, epsg, epsx)
% Variable Declaration
xi = zeros(MaxIter+1,size(x0,2));
xi(1,:) = x0;
Bi = eye(size(x0,2));

% CG algorithm
FunEval = 0;
EF = 0;

for i = 1:MaxIter

%Calculate Gradient around current point
[GradOfU,Eval] = Grad (fun, xi(i,:));
FunEval = FunEval + Eval; %Update function evaluation

%Calculate search direction
di = -Bi\GradOfU ;

%Calculate Alfa via exact line search
[alfa, Eval] = LineSearchAlfa(fun,xi(i,:),di);
FunEval = FunEval + Eval; %Update function evaluation

%Calculate Next solution of X
xi(i+1,:) = xi(i,:) + (alfa*di)';

% Calculate Gradient of X on i+1
[Grad_Next, Eval] = Grad (fun, xi(i+1,:));
FunEval = FunEval + Eval; %Update function evaluation

%Calculate new Beta value using BFGS algorithm
Bi = BFGS(fun,GradOfU,Grad_Next,xi(i+1,:),xi(i,:), Bi);

% Calculate maximum acceptable Euclidean norm of the gradient
if norm(Grad_Next,2) < epsg
EF = 1;
break
end

% Calculate minimum relative change in the optimization variables
E = xi(i+1,:)- xi(i,:);
if norm(E,2) < epsx
EF = 2;
break
end
end
% report optimum solution
x = xi(i+1,:);

if i MaxIter
% report Exit flag that MaxNum of iterations reach
EF = -1;
end

% report MaxNum of iterations reach
Iter = i;

end

%***********************************************************************%
% Broyden, Fletcher, Goldfarb and Shanno (BFGS) formula
%***********************************************************************%
function B = BFGS(fun,GradOfU,Grad_Next,Xi_next,Xi,Bi)

% Calculate Si term
si = Xi_next - Xi;

% Calculate Yi term
yi = Grad_Next - GradOfU;
%
% BFGS formula (Broyden, Fletcher, Goldfarb and Shanno)
%
B = Bi - ((Bi*si'*si*Bi)/(si*Bi*si')) + ((yi*yi')/(yi'*si'));

end

See also
  • Newton's method in optimization
    Newton's method in optimization
    In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...

  • Newton's method
    Newton's method
    In numerical analysis, Newton's method , named after Isaac Newton and Joseph Raphson, is a method for finding successively better approximations to the roots of a real-valued function. The algorithm is first in the class of Householder's methods, succeeded by Halley's method...

  • DFP updating formula
  • BFGS method
    BFGS method
    In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno method is a method for solving nonlinear optimization problems ....

  • L-BFGS
  • OWL-QN
    Orthant-wise limited-memory quasi-Newton
    Orthant-wise limited-memory quasi-Newton is a numerical optimization algorithm that belongs to the class of quasi-Newton methods, and is specifically designed to serve in the training/fitting algorithm of log-linear models with \ell_1-regularization...

  • SR1 formula
    SR1 formula
    The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative based on the derivatives calculated at two points...

  • Broyden's Method
    Broyden's method
    In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK