Median - AbsoluteAstronomy.com

Probability theory

Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

and statistics

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a median is described as the numerical value separating the higher half of a sample, a population

Statistical population

A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

, or a probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean

Arithmetic mean

In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

of the two middle values.

In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size), and, if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless, the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample, is the medoid

Medoid

Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always members of the data set...

.

At most, half the population have values less than the median, and, at most, half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and, if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c; i.e., it is (b + c)/2.

The median can be used as a measure of location

Location parameter

In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

when a distribution is skewed

Skewness

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...

, when end-values are not known, or when one requires reduced importance to be attached to outlier

Outlier

In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

s, e.g., because they may be measurement errors.

Notation

The median of some variable x is denoted either as

or as

Measures of statistical dispersion

When the median is used as a location parameter

Location parameter

in descriptive statistics, there are several choices for a measure of variability: the range

Range (statistics)

In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

, the interquartile range

Interquartile range

In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles...

, the mean absolute deviation

Absolute deviation

In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.D_i = |x_i-m|...

, and the median absolute deviation

Median absolute deviation

In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....

. Since the median is the same as the second quartile, its calculation is illustrated in the article on quartile

Quartile

In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...

Medians of probability distributions

For any probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

on the real

Real number

In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

line with cumulative distribution function

Cumulative distribution function

In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

F, regardless of whether it is any kind of continuous probability distribution, in particular an absolutely continuous distribution

Absolute continuity

In mathematics, the relationship between the two central operations of calculus, differentiation and integration, stated by fundamental theorem of calculus in the framework of Riemann integration, is generalized in several directions, using Lebesgue integration and absolute continuity...

(and therefore has a probability density function

Probability density function

In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

), or a discrete probability distribution, a median m satisfies the inequalities

in which a Lebesgue–Stieltjes integral is used. For an absolutely continuous probability distribution with probability density function

Probability density function

ƒ, we have

Medians of particular distributions

The medians of certain types of distributions can be easily calculated from their parameters:

The median of a normal distribution with mean μ and variance σ² is μ. In fact, for a normal distribution, mean = median = mode.
The median of a uniform distribution
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...

in the interval [a, b] is (a + b) / 2, which is also the mean.
The median of a Cauchy distribution
Cauchy distribution
The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz function, or Breit–Wigner...

with location parameter x₀ and scale parameter y is x₀, the location parameter.
The median of an exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

with rate parameter λ is the natural logarithm of 2 divided by the rate parameter: λ⁻¹ln 2.
The median of a Weibull distribution with shape parameter k and scale parameter λ is λ(ln 2)^1/k.

Medians in descriptive statistics

The median is used primarily for skewed

Skewness

distributions, which it summarizes differently than the arithmetic mean

Arithmetic mean

In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

. Consider the multiset

Multiset

In mathematics, the notion of multiset is a generalization of the notion of set in which members are allowed to appear more than once...

{ 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, as is the mode

Mode (statistics)

In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, and it might be seen as a better indication of central tendency

Central tendency

In statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...

than the arithmetic mean

Arithmetic mean

In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...

of 4.

Calculation of medians is a popular technique in summary statistics

Summary statistics

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...

and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier

Outlier

values than is the mean

Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

An optimality property

A median is also a central point that minimizes the arithmetic mean of the absolute deviations. In the above example, the median value of 2 minimizes the arithmetic mean of the absolute deviations (1 + 0 + 0 + 0 + 1 + 12) / 6 = 2.33; in contrast, the mean value of 4 minimizes the arithmetic mean of the squares (9 + 4 + 4 + 4 + 1 + 100) / 6 = 20.33. In the language of statistics, a value of c that minimizes

is a median of the probability distribution of the random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

X.

However, a median c need not be uniquely defined. Where exactly one median exists, statisticians speak of "the median" correctly; even when no unique median exists, some statisticians speak of "the median" informally.

See also k-medians clustering

K-medians clustering

In statistics and machine learning, k-medians clustering is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median...

An inequality relating means and medians

For continuous probability distributions, the difference between the median and the mean is never more than one standard deviation

Standard deviation

Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

. See an inequality on location and scale parameters.

Efficient computation of the sample median

Even though sorting

Sorting algorithm

In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

n items requires O

Big O notation

In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n log n) operations, selection algorithm

Selection algorithm

In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list . This includes the cases of finding the minimum, maximum, and median elements. There are O, worst-case linear time, selection algorithms...

s can compute the k^th-smallest of n items

Order statistic

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....

(e.g., the median) with only O

Big O notation

(n) operations.

For an odd number of values

As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7.

Start by sorting the values: 1, 2, 5, 7, 8.

In this case, the median is 5 since it is the middle observation in the ordered list.

The median is the ((n + 1)/2)th item, where n is the number of values. For example, for the list {1, 2, 5, 7, 8}, we have n = 5, so the median is the ((5 + 1)/2)th item.

median = (6/2)th item

median = 3rd item

median = 5

For an even number of values

As an example, we will calculate the sample median for the following set of observations: 1, 6, 2, 8, 7, 2.

Start by sorting the values: 1, 2, 2, 6, 7, 8.

In this case, the arithmetic mean of the two middlemost terms is (2 + 6)/2 = 4. Therefore, the median is 4 since it is the arithmetic mean of the middle observations in the ordered list.

We also use this formula MEDIAN = {(n+1)/2} th item . n= Number of values

As above example 1, 2, 2, 6, 7, 8
n=6 Median={(6+1)/2}th item =3.5 th item
3rd item is 2 Median = {2+(0.5*(difference of 3rd and 4th item)}
Median = {2+(0.5*(6-2)}
Median = {2+(0.5*4)}
Median = (2+2)
Median = 4

Other estimates of the median

If data are represented by a statistical model

Statistical model

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

specifying a particular family of probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s, then estimates of the median can be obtained by fitting that family of probability distributions to the data and calculating the theoretical median of the fitted distribution. See, for example Pareto interpolation

Pareto interpolation

Pareto interpolation is a method of estimating the median and other properties of a population that follows a Pareto distribution. It is used in economics when analysing the distribution of incomes in a population, when one must base estimates on a relatively small random sample taken from the...

Median-unbiased estimators, and bias with respect to loss functions

Any mean-unbiased estimator

Bias of an estimator

In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...

minimizes the risk

Risk

Risk is the potential that a chosen action or activity will lead to a loss . The notion implies that a choice having an influence on the outcome exists . Potential losses themselves may also be called "risks"...

(expected loss) with respect to the squared-error loss function

Loss function

In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

, as observed by Gauss

Gauss

Gauss may refer to:*Carl Friedrich Gauss, German mathematician and physicist*Gauss , a unit of magnetic flux density or magnetic induction*GAUSS , a software package*Gauss , a crater on the moon...

. A median-unbiased estimator minimizes the risk with respect to the absolute-deviation

Absolute deviation

loss function, as observed by Laplace. Other loss functions are used in statistical theory

Statistical theory

The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...

, particularly in robust statistics

Robust statistics

Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

.

The theory of median-unbiased estimators was revived by George W. Brown in 1947:

An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if, for fixed θ, the median of the distribution of the estimate is at the value θ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. [page 584]

Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where mean-unbiased and maximum-likelihood estimators do not exist. Besides being invariant under one-to-one transformations

Injective function

In mathematics, an injective function is a function that preserves distinctness: it never maps distinct elements of its domain to the same element of its codomain. In other words, every element of the function's codomain is mapped to by at most one element of its domain...

, median-unbiased estimators have surprising robustness

Robust statistics

In image processing

In monochrome

Monochrome

Monochrome describes paintings, drawings, design, or photographs in one color or shades of one color. A monochromatic object or image has colors in shades of limited colors or hues. Images using only shades of grey are called grayscale or black-and-white...

raster images there is a type of noise, known as the salt and pepper noise

Salt and pepper noise

Salt and pepper noise is a form of noise typically seen on images. It represents itself as randomly occurring white and black pixels. An effective noise reduction method for this type of noise involves the usage of a median filter, morphological filter or a contra harmonic mean filter.Salt and...

, when each pixel independently become black (with some small probability) or white (with some small probability), and is unchanged otherwise (with the probability close to 1).
An image constructed of median values of neighborhoods (like 3×3 square) can effectively reduce noise

Noise reduction

Noise reduction is the process of removing noise from a signal.All recording devices, both analogue or digital, have traits which make them susceptible to noise...

in this case.

History

Gustav Fechner popularized the median into the formal analysis of data, although it had been used previously by Laplace.

External links

A Guide to Understanding & Calculating the Median
Median as a weighted arithmetic mean of all Sample Observations
On-line calculator
Calculating the median
A problem involving the mean, the median, and the mode.
Python script for Median computations and income inequality metrics
Income inequality metrics
The concept of inequality is distinct from that of poverty and fairness. Income inequality metrics or income distribution metrics are used by social scientists to measure the distribution of income, and economic inequality among the participants in a particular economy, such as that of a specific...

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.