Unit-weighted regression
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, unit-weighted regression is perhaps the easiest form of multiple regression analysis, a method in which two or more variables are used to predict the value of an outcome.

At a conceptual level, the example of weight loss can illustrate the idea of multiple regression. If a group of people join a weight loss program, we might wish to predict who would lose weight. The outcome is weight loss. We might find that those who lost weight were likely to increase their fruit intake, to exercise more, and to substitute low-calorie drinks for sugary drinks. The point is that several variables can be considered at the same time for their effect on an outcome of interest.

Beta weights

In the standard form of multiple regression, each predictor is multiplied by a number that is called the beta weight. The prediction is obtained by adding these products (and usually by adding a constant, as well). In the weight loss example above, suppose that reducing sugary drinks led to twice as much weight loss as did the other variables. If that were the case, then the beta weight for weight loss would be twice as big as the weights for the other variables.

When the weights are chosen to give the best prediction by some criterion, the model is called a proper linear model
Proper linear model
In statistics, a proper linear model is a linear regression model in which the weights given to the predictor variables are chosen in such a way as to optimize the relationship between the prediction and the criterion. Simple regression analysis is the most common example of a proper linear model...

. Therefore, multiple regression is a proper linear model. By contrast, unit-weighted regression is called an improper linear model.

Model specification

Standard multiple regression has a major assumption: it assumes that all the important predictors are in the equation. This assumption is called model specification. A model is specified when all the predictors are in the equation, and no irrelevant predictors are in the equation.

However, in the social sciences, it is rare for a study to be able to know all the important predictors of a behavioral outcome. Therefore, most models are not specified. When the model is not specified, the estimates for the beta weights are not accurate. Because the inclusion of one variable can cause the beta weights to fluctuate wildly, this fluctuation is sometimes called the problem of the bouncing betas. It is this problem with bouncing betas that makes unit-weighted regression a useful method.

Unit weights

Unit-weighted regression proceeds in three steps. First, predictors for the outcome of interest are selected; ideally, there should be good empirical or theoretical reasons for the selection. Second, continuous predictor variables are changed to Z scores
Standard score
In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation...

. Third, the predictors are added together; the sum is called the variate. This variate is used as the predictor of the outcome, also expressed in z scores. The relationship of this variate to the outcome is assessed with the Pearson R correlation
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...

.

One small variation on unit-weighted regression is to make the weights not one, but one divided by the number of predictors. Thus, with three predictors, the weight of each variable is 1/3; with four predictors, the weight is 1/4; and so on. The value of this variation is that the variate is already in z score form.

A second variation occurs when predictors are binary. In this case, the predictors are scored as one (present) or zero (absent).

Literature review

The idea of unit-weighted regression was introduced in 1938 by Samuel Stanley Wilks, a leading statistician who had a special interest in multivariate analysis
Multivariate analysis
Multivariate analysis is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time...

. Wilks described how unit weights could be used in practical settings, when data were not available to estimate beta weights. For example, a small college may want to select good students for admission. But the school may have no money to gather data and conduct a standard multiple regression analysis. In this case, the school could use several predictors—high school grades, SAT scores, teacher ratings. Wilks showed mathematically why unit weights should work well in practice.

Frank Schmidt in 1971 conducted a simulation study of unit weights. His results showed that Wilks was indeed correct and that unit weights tend to perform well in simulations of practical studies.

Robyn Dawes
Robyn Dawes
Robyn Mason Dawes was an American psychologist who specialized in the field of human judgment. His research interests included human irrationality, human cooperation, intuitive expertise, and the United States AIDS policy...

 in 1979 discussed the use of unit weights in applied studies, referring to the robust beauty of unit weighted models. Jacob Cohen in 1990 also discussed the value of unit weights and noted their practical utility. Indeed, he wrote, "As a practical matter, most of the time, we are better off using unit weights" (p. 1306).

Dave Kerby in 2003 showed that unit weights compare well with standard regression, doing so with a cross validation study—that is, he derived beta weights in one sample and applied them to a second sample. The outcome of interest was suicidal thinking, and the predictor variables were broad personality traits. Kerby also showed how Regression Tree analysis could be combined with unit weights to further simplify unit-weighted regression. In this approach, the variate consists of merely the weighted counts of significant predictors.

Example

An example may clarify how unit weights can be useful in practice.

Brenna Bry and colleagues (1982) addressed the question of what causes drug use in adolescents. Previous research had made use of multiple regression; with this method, it is natural to look for the best predictor, the one with the highest beta weight. One previous study had found that early use of alcohol was the best predictor. Another study had found that alienation from parents was the best predictor. Still another study had found that a low grades in school was the best predictor. The failure to replicate was clearly a problem, a problem that could be caused by bouncing betas.

Bry and colleagues suggested a different approach. Instead of looking for the best predictor, they looked at the number of predictors. In other words, they gave a unit weight to each predictor. Their study had six predictors: 1) grades in school, 2) affiliation with religion, 3) age of alcohol use, 4) psychological distress, 5) self-esteem, and 6) alienation from parents. Each risk factor was scored as one (present) or zero (absent). For example, grades in school were scored as one when the grades were Ds or Fs. The results showed that the number of risk factors was a good predictor of drug use: adolescents with more risk factors were more likely to use drugs.

The model used by Bry and colleagues was that drug users do not differ in any special way from non-drug users. Rather, they differ in the number of problems they must face. "The number of factors an individual must cope with is more important than exactly what those factors are" (p. 277). Given this model, unit-weighted regression is an appropriate method of analysis.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK