K-medians clustering
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, k-medians clustering is a variation of k-means clustering where instead of calculating the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

 for each cluster to determine its centroid, one instead calculates the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

. This has the effect of minimizing error over all clusters with respect to the 1-norm
Norm (mathematics)
In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to all vectors in a vector space, other than the zero vector...

 distance metric, as opposed to the square of the 2-norm
Norm (mathematics)
In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to all vectors in a vector space, other than the zero vector...

 distance metric (which
k-means does.)

The relates directly to the k
-median problem which is the problem of finding k centers such that the clusters formed by them are the most compact. Formally, given a set of data points x, the k centers ci are to be chosen so as to minimize the sum of the distances from each x to the nearest ci.

The criterion function formulated in this way is sometimes a better criterion than that used in the k-means clustering algorithm, in which the sum of the squared distances is used. The sum of distances is widely used in applications such as facility location
Facility location
Facility location, also known as location analysis, is a branch of operations research and computational geometry concerning itself with mathematical modeling and solution of problems concerning optimal placement of facilities in order to minimize transportation costs, avoid placing hazardous...

.

Note that this algorithm is often confused with k-medoids
K-medoids
The -medoids algorithm is a clustering algorithm related to the -means algorithm and the medoidshift algorithm. Both the -means and -medoids algorithms are partitional and both attempt to minimize squared error, the distance between points labeled to be in a cluster and a point designated as the...

, which finds the optimal medoid, not median, for each cluster. (A medoid is an actual point from the dataset; a median is the mathematical median calculated separately for each dimension.)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK