Backfitting algorithm
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the backfitting algorithm is a simple iterative procedure used to fit a generalized additive model
Generalized additive model
In statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....

. It was introduced in 1985 by Leo Breiman and Jerome Friedman along with generalized additive models. In most cases, the backfitting algorithm is equivalent to the Gauss–Seidel method algorithm for solving a certain linear system of equations

Algorithm

Additive models are a class of non-parametric regression models of the form:


where each is a variable in our -dimensional predictor , and is our outcome variable. represents our inherent error, which is assumed to have mean zero. The represent unspecified smooth functions of a single . Given the flexibility in the , we typically do not have a unique solution: is left unidentifiable as one can add any constants to any of the and subtract this value from . It is common to rectify this by constraining
for all


leaving


necessarily.

The backfitting algorithm is then:

Initialize ,
Do until converge:
For each predictor j:
(a) (backfitting step)
(b) (mean centering of estimated function)

where is our smoothing operator. This is typically chosen to be a cubic spline smoother
Smoothing spline
The smoothing spline is a method of smoothing using a spline function.-Definition:Let ;x_1...

 but can be any other appropriate fitting operation, such as:
  • local polynomial regression
    Polynomial regression
    In statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...

  • kernel smoothing methods
  • more complex operators, such as surface smoothers for second and higher-order interactions


In theory, step (b) in the algorithm is not needed as the function estimates are constrained to sum to zero. However, due to numerical issues this might become a problem in practice.

Motivation

If we consider the problem of minimizing the expected squared error:


There exists a unique solution by the theory of projections given by:


for i = 1, 2, ..., p.

This gives the matrix interpretation:


where . In this context we can imagine a smoother matrix, , which approximates our and gives an estimate, , of


or in abbreviated form


An exact solution of this is infeasible to calculate for large np, so the iterative technique of backfitting is used. We take initial guesses and update each in turn to be the smoothed fit for the residuals of all the others:


Looking at the abbreviated form it is easy to see the backfitting algorithm as equivalent to the Gauss–Seidel method for linear smoothing operators S.

Explicit derivation for two dimensions

For the two dimensional case, we can formulate the backfitting algorithm explicitly. We have:


If we denote as the estimate of in the ith updating step, the backfitting steps are


By induction we get


and


If we assume our constant is zero and we set then we get



This converges if .

Issues

The choice of when to stop the algorithm is arbitrary and it is hard to know a priori how long reaching a specific conversion threshold will take. Also, the final model depends on the order in which the predictor variables are fit.

As well, the solution found by the backfitting procedure is non-unique. If is a vector such that from above, then if is a solution then so is is also a solution for any . A modification of the backfitting algorithm involving projections onto the eigenspace of S can remedy this problem.

Modified algorithm

We can modify the backfitting algorithm to make it easier to provide a unique solution. Let be the space spanned by all the eigenvectors of Si that correspond to eigenvalue 1. Then any b satisfying has and Now if we take to be a matrix that projects orthogonally onto , we get the following modified backfitting algorithm:

Initialize ,,
Do until converge:
Regress onto the space , setting
For each predictor j:
Apply backfitting update to using the smoothing operator , yielding new estimates for

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK