Projection pursuit regression
Encyclopedia
In statistics
, projection pursuit regression (PPR) is a statistical model
developed by Jerome H. Friedman and Werner Stuetzle which is an extension of additive model
s. This model adapts the additive models in that it first projects the data matrix of explanatory variables in the optimal direction before applying smoothing functions to these explanatory variables.
s of non-linear transformations of linear combinations of explanatory variables. The basic model takes the form
where x is a column vector containing a particular row of the design matrix X which contains p explanatory variables (columns) and n observations (row). Here Y is the a particular observation variable (identifying the row being considered) to be predicted , {βj} is a collection of r vectors (each a unit vector of length p) which contain the unknown parameters. Finally r is the number of modelled smoothed non-parametric functions to be used as constructed explanatory variables. The value of r is found through cross-validation
or a forward stage-wise strategy which stops when the model fit cannot be significantly improved. For large values of r and an appropriate set of functions fj, the PPR model is considered a universal estimator as it can estimate any continuous function in Rp.
Thus this model takes the form of the basic additive model but with the additional βj component; making it fit rather than the actual inputs x. The vector is the projection of X onto the unit vector βj, where the directions βj are chosen to optimize model fit. The functions fj are unspecified by the model and estimated using some flexible smoothing method; preferably one with well defined second derivatives to simplify computation. This allows the PPR to be very general as it fits non-linear functions fj of any class of linear combinations in X. Due the flexibility and generality of this model, it is difficult to interpret the fitted model because each input variable has been entered into the model in a complex and multifaceted way. Thus the model is far more useful for prediction than creating a model to understand the data.
over the functions and vectors . After estimating the smoothing functions , one generally uses the Gauss–Newton iterated convergence technique to solve for ; provided that the functions are twice differentiable.
It has been shown that the convergence rate, the bias and the variance are affected by the estimation of and . It has also been shown that converges at an order of , while converges at a slightly worse order.
models project the input vector onto a one-dimensional hyperplane and then apply a nonlinear transformation of the input variables that are then added in a linear fashion. Thus both follow the same steps to overcome the curse of dimensionality. The main difference is that the functions being fitted in PPR can be different for each combination of input variables and are estimated one at a time and then updated with the weights, whereas is NN these are all specified upfront and estimated simultaneously.
Thus, PPR estimation is more straightforward than NN and the transformations of variables in PPR is data driven whereas in NN, these transformations are fixed.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, projection pursuit regression (PPR) is a statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
developed by Jerome H. Friedman and Werner Stuetzle which is an extension of additive model
Additive model
In statistics, an additive model is a nonparametric regression method. It was suggested by Jerome H. Friedman and Werner Stuetzle and is an essential part of the ACE algorithm. The AM uses a one dimensional smoother to build a restricted class of nonparametric regression models. Because of this,...
s. This model adapts the additive models in that it first projects the data matrix of explanatory variables in the optimal direction before applying smoothing functions to these explanatory variables.
Model overview
The model consists of linear combinationLinear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...
s of non-linear transformations of linear combinations of explanatory variables. The basic model takes the form
where x is a column vector containing a particular row of the design matrix X which contains p explanatory variables (columns) and n observations (row). Here Y is the a particular observation variable (identifying the row being considered) to be predicted , {βj} is a collection of r vectors (each a unit vector of length p) which contain the unknown parameters. Finally r is the number of modelled smoothed non-parametric functions to be used as constructed explanatory variables. The value of r is found through cross-validation
Cross-validation
Cross-validation, sometimes called rotation estimation, is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will...
or a forward stage-wise strategy which stops when the model fit cannot be significantly improved. For large values of r and an appropriate set of functions fj, the PPR model is considered a universal estimator as it can estimate any continuous function in Rp.
Thus this model takes the form of the basic additive model but with the additional βj component; making it fit rather than the actual inputs x. The vector is the projection of X onto the unit vector βj, where the directions βj are chosen to optimize model fit. The functions fj are unspecified by the model and estimated using some flexible smoothing method; preferably one with well defined second derivatives to simplify computation. This allows the PPR to be very general as it fits non-linear functions fj of any class of linear combinations in X. Due the flexibility and generality of this model, it is difficult to interpret the fitted model because each input variable has been entered into the model in a complex and multifaceted way. Thus the model is far more useful for prediction than creating a model to understand the data.
Model estimation
For a given set of data , the goal is to minimize the error functionover the functions and vectors . After estimating the smoothing functions , one generally uses the Gauss–Newton iterated convergence technique to solve for ; provided that the functions are twice differentiable.
It has been shown that the convergence rate, the bias and the variance are affected by the estimation of and . It has also been shown that converges at an order of , while converges at a slightly worse order.
Advantages of PPR estimation
- It uses univariate regression functions instead of their multivariate form, thus effectively dealing with the curse of dimensionalityCurse of dimensionalityThe curse of dimensionality refers to various phenomena that arise when analyzing and organizing high-dimensional spaces that do not occur in low-dimensional settings such as the physical space commonly modeled with just three dimensions.There are multiple phenomena referred to by this name in...
- Univariate regression allows for simple and efficient estimation
- Relative to generalized additive models, PPR can estimate a much richer class of functions
- Unlike local averaging methods (such as k-nearest neighbors), PPR can ignore variables with low explanatory power.
Disadvantages of PPR estimation
- PPR requires examining an M-dimensional parameter space in order to estimate .
- One must select the smoothing parameter fo .
- The model is often difficult to interpret
Extensions of PPR
- Alternate smoothers, such as the radial function, harmonic function and additive function, have been suggested and their performances vary depending on the data sets used.
- Alternate optimization criteria have been used as well, such as standard absolute deviations and mean absolute deviations.
- Ordinary least squaresOrdinary least squaresIn statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...
can be used to simplify calculations as often the data does not have strong non-linearities. - Sliced Inverse Regression (SIR) has been used to choose the direction vectors for PPR.
- Generalized PPR combines regular PPR with iteratively reweighted least squares (IRLS) and a link function to estimate binary data.
PPR vs neural networks (NN)
Both projection pursuit regression and neural networksNeural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
models project the input vector onto a one-dimensional hyperplane and then apply a nonlinear transformation of the input variables that are then added in a linear fashion. Thus both follow the same steps to overcome the curse of dimensionality. The main difference is that the functions being fitted in PPR can be different for each combination of input variables and are estimated one at a time and then updated with the weights, whereas is NN these are all specified upfront and estimated simultaneously.
Thus, PPR estimation is more straightforward than NN and the transformations of variables in PPR is data driven whereas in NN, these transformations are fixed.