Listwise deletion
Encyclopedia
In statistics, listwise deletion is a method for handling missing data. In this method, an entire record is excluded from analysis if any single value is missing.
A researcher hoping to model income (dependent variable) based on age and gender (independent variables). Using listwise deletion, the researcher would remove subjects 3, 4 and 8 from the sample before performing any further analysis.
of the tests conducted. Statistical power relies in part on high sample size. Because listwise deletion excludes data with missing values, it reduces the sample which is being statistically analysed.
Listwise deletion is also problematic when the reason for missing data may not be random (i.e. questions in questionnaires aiming to extract sensitive information). Due to the method much of the subjects' data will be excluded from analysis leaving a bias in data findings. For instance, a questionnaire may include questions about respondents current earnings and sexual persuasions as well as their views on a certain subject. Many of the subjects in the sample may not answer these questions due to the intrusive nature of the questions but may answer all other questions. Listwise deletion will exclude these respondents from analysis. This may create a bias as participants who do divulge this information may have different characteristics than participants who do not.
Example
For example, consider the following questionnaire, as answered by 10 subjects:Subject | Age | Gender | Income |
---|---|---|---|
1 | 29 | M | $40,000 |
2 | 45 | M | $36,000 |
3 | 81 | M | --missing-- |
4 | 22 | --missing-- | $16,000 |
5 | 41 | M | $98,000 |
6 | 33 | F | $60,000 |
7 | 22 | F | $24,000 |
8 | --missing-- | F | $81,000 |
9 | 33 | F | $55,000 |
10 | 45 | F | $80,000 |
A researcher hoping to model income (dependent variable) based on age and gender (independent variables). Using listwise deletion, the researcher would remove subjects 3, 4 and 8 from the sample before performing any further analysis.
Problems with listwise deletion
Listwise deletion affects statistical powerStatistical power
The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
of the tests conducted. Statistical power relies in part on high sample size. Because listwise deletion excludes data with missing values, it reduces the sample which is being statistically analysed.
Listwise deletion is also problematic when the reason for missing data may not be random (i.e. questions in questionnaires aiming to extract sensitive information). Due to the method much of the subjects' data will be excluded from analysis leaving a bias in data findings. For instance, a questionnaire may include questions about respondents current earnings and sexual persuasions as well as their views on a certain subject. Many of the subjects in the sample may not answer these questions due to the intrusive nature of the questions but may answer all other questions. Listwise deletion will exclude these respondents from analysis. This may create a bias as participants who do divulge this information may have different characteristics than participants who do not.
Compared to other methods
While listwise deletion does have its problems, it is preferable to many other methods for handling missing data. In some cases, it may even be the least problematic method. The following table provides some comparisons of listwise deletions to other methods:Method | Comparison |
---|---|
Pairwise deletion | Ambiguous definition of sample size causes bias in estimated standard errors and test statistics. |
Dummy variable adjustment | Produces biased estimates of coefficients. |