Chauvenet's criterion - AbsoluteAstronomy.com

In statistical theory, the Chauvenet's criterion (named for William Chauvenet

William Chauvenet

William Chauvenet was an early American educator. A professor of mathematics, astronomy, navigation, and surveying, he was always known and well liked among students and faculty....

) is a means of assessing whether one piece of experimental data — an outlier

Outlier

In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

— from a set of observations, is likely to be spurious.

To apply Chauvenet's criterion, first calculate the mean

Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

and standard deviation

Standard deviation

Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

of the observed data. Based on how much the suspect datum differs from the mean, use the normal distribution function (or a table thereof) to determine the probability

Probability

Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

that a given data point will be at the value of the suspect data point. Multiply this probability by the number of data points taken. If the result is less than 0.5, the suspicious data point may be discarded, i.e., a reading may be rejected if the probability of obtaining the particular deviation from the mean is less than 1/(2n).

Example

For instance, suppose a value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50. The mean is 16.7 and the standard deviation 16.34. 50 differs from 16.7 by 33.3, slightly more than two standard deviations. The probability of taking data more than two standard deviations from the mean is roughly 0.05. Six measurements were taken, so the statistic value (data size multiplied by the probability) is 0.05×6 = 0.3. Because 0.3 < 0.5, according to Chauvenet's criterion, the measured value of 50 should be discarded (leaving a new mean of 10, with standard deviation 0.7).

Peirce's criterion

Another method for eliminating spurious data is called Peirce's criterion
Peirce's criterion
In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce.-The problem of outliers:...

. It was developed a few years before Chauvenet's criterion was published, and it is a more rigorous approach to the rational deletion of outlier data. See S. Ross reference below. Other methods such as Grubbs' test for outliers

Grubbs' test for outliers

Grubbs' test , also known as the maximum normed residual test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.-Definition:...

are mentioned under the listing for Outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

.

Criticism

Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors; while Chauvenet's criterion provides an objective and quantitative method for data rejection, it does not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.