In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
, quasi-Newton methods (also known as variable metric methods) are algorithms for finding local maxima and minima
Maxima and minima
In mathematics, the maximum and minimum of a function, known collectively as extrema , are the largest and smallest value that the function takes at a point either within a given neighborhood or on the function domain in its entirety .More generally, the...
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...
In mathematics, particularly in calculus, a stationary point is an input to a function where the derivative is zero : where the function "stops" increasing or decreasing ....
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
is 0. Newton's method assumes that the function can be locally approximated as a quadratic
Quadratic function
A quadratic function, in mathematics, is a polynomial function of the formf=ax^2+bx+c,\quad a \ne 0.The graph of a quadratic function is a parabola whose axis of symmetry is parallel to the y-axis....
in the region around the optimum, and use the first and second derivatives (gradient and Hessian) to find the stationary point.
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...
In calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...
s of the function to be minimized
does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead.
Quasi-Newton methods are a generalization of the secant method
Secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite difference approximation of Newton's method. However, the method was developed...
to find the root of the first derivative
for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods
differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.
The first quasi-Newton algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory
Argonne National Laboratory
Argonne National Laboratory is the first science and engineering research national laboratory in the United States, receiving this designation on July 1, 1946. It is the largest national laboratory by size and scope in the Midwest...
. He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula
SR1 formula
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative based on the derivatives calculated at two points...
BHHH is an optimization algorithm in econometrics similar to Gauss–Newton algorithm. It is an acronym of the four originators: Berndt, B. Hall, R. Hall, and Jerry Hausman.-Usage:...
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno method is a method for solving nonlinear optimization problems ....
(suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension, L-BFGS
L-BFGS
The limited-memory BFGS algorithm is a member of the broad family of quasi-Newton optimization methods that uses a limited memory variation of the Broyden–Fletcher–Goldfarb–Shanno update to approximate the inverse Hessian matrix...
. The Broyden's class is a linear combination of the DFP and BFGS methods.
The SR1 formula does not guarantee the update matrix to maintain positive-definiteness
Positive-definite matrix
In linear algebra, a positive-definite matrix is a matrix that in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form ....
In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....
does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient)
by updating the Jacobian (rather than the Hessian).
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...
, one uses a second order approximation to find the minimum of a function .
The Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...
.
The gradient of this approximation (with respect to ) is
and setting this gradient to zero provides the Newton step:
The Hessian approximation is chosen to satisfy
which is called the secant equation(the Taylor series of the gradient itself). In more than one dimension is under determined. In one dimension, solving for and applying the Newton's step with the updated value is equivalent to the secant method
Secant method
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite difference approximation of Newton's method. However, the method was developed...
.
Various methods are used to find the solution to the secant equation that is symmetric ()
and closest to the current approximate value according to some metric .
An approximate initial value of is often sufficient to achieve rapid convergence. The unknown is updated applying the Newton's step calculated using the current approximate Hessian matrix
In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods.In these methods the idea is to find\min_x f...
;
;
The gradient computed at the new point , and
is used to update the approximate Hessian , or directly its inverse using the Sherman-Morrison formula
Sherman-Morrison formula
In mathematics, in particular linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of the sum of an invertiblematrix Aand the dyadic product, u v^T,of a column vector u and a row vector v^T...
.
A key property of the BFGS and DFP updates is that if is positive definite and is chosen to satisfy the Wolfe conditions then is also positive definite.
In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative based on the derivatives calculated at two points...
Example algorithm using matlab
%***********************************************************************%
% Usage: [x,Iter,FunEval,EF] = Quasi_Newton (fun,x0,MaxIter,epsg,epsx)
% fun: name of the multidimensional scalar objective function
% (string). This function takes a vector argument of length
% n and returns a scalar.
% x0: starting point (row vector of length n).
% MaxIter: maximum number of iterations to find a solution.
% epsg: maximum acceptable Euclidean norm of the gradient of the
% objective function at the solution found.
% epsx: minimum relative change in the optimization variables x.
% x: solution found (row vector of length n).
% Iter: number of iterations needed to find the solution.
% FunEval: number of function evaluations needed.
% EF: exit flag,
% EF=1: successful optimization (gradient is small enough).
% EF=2: algorithm converged (relative change in x is small
% enough).
% EF=-1: maximum number of iterations exceeded.
% C) Quasi-Newton optimization algorithm using (BFGS) %
function [x,i,FunEval,EF]= Quasi_Newton (fun, x0, MaxIter, epsg, epsx)
% Variable Declaration
xi = zeros(MaxIter+1,size(x0,2));
xi(1,:) = x0;
Bi = eye(size(x0,2));
% CG algorithm
FunEval = 0;
EF = 0;
for i = 1:MaxIter
%Calculate Gradient around current point
[GradOfU,Eval] = Grad (fun, xi(i,:));
FunEval = FunEval + Eval; %Update function evaluation
%Calculate search direction
di = -Bi\GradOfU ;
%Calculate Alfa via exact line search
[alfa, Eval] = LineSearchAlfa(fun,xi(i,:),di);
FunEval = FunEval + Eval; %Update function evaluation
%Calculate Next solution of X
xi(i+1,:) = xi(i,:) + (alfa*di)';
% Calculate Gradient of X on i+1
[Grad_Next, Eval] = Grad (fun, xi(i+1,:));
FunEval = FunEval + Eval; %Update function evaluation
%Calculate new Beta value using BFGS algorithm
Bi = BFGS(fun,GradOfU,Grad_Next,xi(i+1,:),xi(i,:), Bi);
% Calculate maximum acceptable Euclidean norm of the gradient
if norm(Grad_Next,2) < epsg
EF = 1;
break
end
% Calculate minimum relative change in the optimization variables
E = xi(i+1,:)- xi(i,:);
if norm(E,2) < epsx
EF = 2;
break
end
end
% report optimum solution
x = xi(i+1,:);
if i MaxIter
% report Exit flag that MaxNum of iterations reach
EF = -1;
end
% report MaxNum of iterations reach
Iter = i;
end
%***********************************************************************%
% Broyden, Fletcher, Goldfarb and Shanno (BFGS) formula
%***********************************************************************%
function B = BFGS(fun,GradOfU,Grad_Next,Xi_next,Xi,Bi)
% Calculate Si term
si = Xi_next - Xi;
% Calculate Yi term
yi = Grad_Next - GradOfU;
%
% BFGS formula (Broyden, Fletcher, Goldfarb and Shanno)
%
B = Bi - ((Bi*si'*si*Bi)/(si*Bi*si')) + ((yi*yi')/(yi'*si'));
In mathematics, Newton's method is an iterative method for finding roots of equations. More generally, Newton's method is used to find critical points of differentiable functions, which are the zeros of the derivative function.-Method:...
In numerical analysis, Newton's method , named after Isaac Newton and Joseph Raphson, is a method for finding successively better approximations to the roots of a real-valued function. The algorithm is first in the class of Householder's methods, succeeded by Halley's method...
Orthant-wise limited-memory quasi-Newton is a numerical optimization algorithm that belongs to the class of quasi-Newton methods, and is specifically designed to serve in the training/fitting algorithm of log-linear models with \ell_1-regularization...
In numerical analysis, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in k variables. It was originally described by C. G. Broyden in 1965....
The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.