Shrinkage (statistics)
Encyclopedia
In statistics
, shrinkage has two meanings:
A common idea underlying both of these meanings is the reduction in the effects of sampling variation.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, shrinkage has two meanings:
- In relation to the general observation that, in regression analysisRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determinationCoefficient of determinationIn statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...
'shrinks'. This idea is complementary to overfittingOverfittingIn statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...
and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides "shrinkage." But the adjustment formula yields an artificial shrinkage, in contrast to the first definition. - To describe general types of estimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
s, or the effects of some types of estimation, whereby a naive or raw estimate is improved by combining it with other information.: see shrinkage estimatorShrinkage estimatorIn statistics, a shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naïve or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is...
. The term relates to the notion that the improved estimate is at a reduced distance from the value supplied by the 'other information' than is the raw estimate. In this sense, shrinkage is used to regularizeRegularization (mathematics)In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to solve an ill-posed problem or to prevent overfitting...
ill-posed inferenceStatistical inferenceIn statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
problems.
A common idea underlying both of these meanings is the reduction in the effects of sampling variation.