Truncated mean
Encyclopedia
A truncated mean or trimmed mean is a statistical
measure of central tendency
, much like the mean
and median
. It involves the calculation of the mean after discarding given parts of a probability distribution
or sample
at the high and low end, and typically discarding an equal amount of both.
For most statistical applications, 5 to 25 percent of the ends are discarded. In some regions of Central Europe
it is also known as a Windsor mean, but this name should not be confused with the Winsorized mean
: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values.
s than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator
.
One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter
of a Cauchy distribution
, a bell shaped probability distribution with fatter tails than a normal distribution. It can be shown that the truncated mean of the middle 24% sample order statistics (i.e., truncate the sample by 38%) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean. However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate. Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a maximum likelihood
estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is difficult to compute, leaving the truncated mean as a useful alternative.
than the median
, but unless the underlying distribution is symmetric
, the truncated mean of a sample is unlikely to produce an unbiased estimator
for either the mean or the median.
s that are evaluated by a panel of judges is a truncated mean: discard the lowest and the highest scores; calculate the mean value of the remaining scores. The interquartile mean
is another example when the lowest 25% and the highest 25% are discarded, and the mean of the remaining scores is calculated.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
measure of central tendency
Average
In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
, much like the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
and median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
. It involves the calculation of the mean after discarding given parts of a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
or sample
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
at the high and low end, and typically discarding an equal amount of both.
For most statistical applications, 5 to 25 percent of the ends are discarded. In some regions of Central Europe
Central Europe
Central Europe or alternatively Middle Europe is a region of the European continent lying between the variously defined areas of Eastern and Western Europe...
it is also known as a Windsor mean, but this name should not be confused with the Winsorized mean
Winsorized mean
A Winsorized mean is a Winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean...
: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values.
Notation
The index of the mean is an indication of the percentage of the entries removed on both sides. For example, if you were to truncate a sample with 8 entries by 12.5%, you would discard the first and the last entry in the sample when calculating the truncated mean.Interpolation
When a trimmed mean for a sample must be determined, but it cannot be accurately done, the best is to calculate the nearest two trimmed means, and interpolate (usually linearly). For example, if you need to calculate the 15% trimmed mean of a sample containing 10 entries, you would calculate the 10% trimmed mean (removing 1 entry on either side of the sample), the 20% trimmed mean (removing 2 entries on either side), and interpolating to determine the 15% trimmed mean.Advantages
The truncated mean is a useful estimator because it is less sensitive to outlierOutlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
s than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...
.
One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
of a Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...
, a bell shaped probability distribution with fatter tails than a normal distribution. It can be shown that the truncated mean of the middle 24% sample order statistics (i.e., truncate the sample by 38%) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean. However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate. Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is difficult to compute, leaving the truncated mean as a useful alternative.
Drawbacks
The truncated mean uses more information from the distribution or sampleSample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
than the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
, but unless the underlying distribution is symmetric
Symmetry
Symmetry generally conveys two primary meanings. The first is an imprecise sense of harmonious or aesthetically pleasing proportionality and balance; such that it reflects beauty or perfection...
, the truncated mean of a sample is unlikely to produce an unbiased estimator
Bias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
for either the mean or the median.
Examples
The scoring method used in many sportSport
A Sport is all forms of physical activity which, through casual or organised participation, aim to use, maintain or improve physical fitness and provide entertainment to participants. Sport may be competitive, where a winner or winners can be identified by objective means, and may require a degree...
s that are evaluated by a panel of judges is a truncated mean: discard the lowest and the highest scores; calculate the mean value of the remaining scores. The interquartile mean
Interquartile mean
The interquartile mean is a statistical measure of central tendency, much like the mean , the median, and the mode....
is another example when the lowest 25% and the highest 25% are discarded, and the mean of the remaining scores is calculated.