Statistical dispersion
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, statistical dispersion (also called statistical variability or variation) is variability or spread in a variable
Variable (mathematics)
In mathematics, a variable is a value that may change within the scope of a given problem or set of operations. In contrast, a constant is a value that remains unchanged, though often unknown or undetermined. The concepts of constants and variables are fundamental to many areas of mathematics and...

 or a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

. Common examples of measures of statistical dispersion are the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

 and interquartile range
Interquartile range
In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles...

.

Dispersion is contrasted with location or central tendency
Central tendency
In statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...

, and together they are the most used properties of distributions.

Measures of statistical dispersion

A measure of statistical dispersion is a real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

 that is zero if all the data are identical, and increases as the data become more diverse. It cannot be less than zero.

Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have unit
Units of measurement
A unit of measurement is a definite magnitude of a physical quantity, defined and adopted by convention and/or by law, that is used as a standard for measurement of the same physical quantity. Any other value of the physical quantity can be expressed as a simple multiple of the unit of...

s, such as metres or seconds, the measure of dispersion has the same units. Such measures of dispersion include:
  • Standard deviation
    Standard deviation
    Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

  • Interquartile range
    Interquartile range
    In descriptive statistics, the interquartile range , also called the midspread or middle fifty, is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles...

     or Interdecile range
    Interdecile range
    In statistics, the interdecile range is the difference between the first and the ninth deciles . The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range....

  • Range
    Range (statistics)
    In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...

  • Mean difference
    Mean difference
    The mean difference is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean difference, which is the mean difference divided by the arithmetic mean...

  • Median absolute deviation
    Median absolute deviation
    In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....

  • Average absolute deviation (or simply called average deviation)
  • Distance standard deviation

These are frequently used (together with scale factor
Scale factor
A scale factor is a number which scales, or multiplies, some quantity. In the equation y=Cx, C is the scale factor for x. C is also the coefficient of x, and may be called the constant of proportionality of y to x...

s) as estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

s of scale parameter
Scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...

s, in which capacity they are called estimates of scale.

All the above measures of statistical dispersion have the useful property that they are location-invariant, as well as linear in scale. So if a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 X has a dispersion of SX then a linear transformation
Linear transformation
In mathematics, a linear map, linear mapping, linear transformation, or linear operator is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. As a result, it always maps straight lines to straight lines or 0...

 Y = aX + b for real
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

 a and b should have dispersion SY = |a|SX.

Other measures of dispersion are dimensionless (scale-free). In other words, they have no units even if the variable itself has units. These include:
  • Coefficient of variation
    Coefficient of variation
    In probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...

  • Quartile coefficient of dispersion
    Quartile coefficient of dispersion
    In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and which is used to make comparisons within and between data sets....

  • Relative mean difference, equal to twice the Gini coefficient
    Gini coefficient
    The Gini coefficient is a measure of statistical dispersion developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper "Variability and Mutability" ....



There are other measures of dispersion:
  • Variance
    Variance
    In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

     (the square of the standard deviation) — location-invariant but not linear in scale.
  • Variance-to-mean ratio — mostly used for count data when the term coefficient of dispersion
    Coefficient of dispersion
    In probability theory and statistics, the index of dispersion, dispersion index, coefficient of dispersion, or variance-to-mean ratio , like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of...

     is used and when this ratio is dimensionless, as count data are themselves dimensionless: otherwise this is not scale-free.


Some measures of dispersion have specialized purposes, among them the Allan variance
Allan variance
The Allan variance , also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan. It is expressed mathematically as\sigma_y^2. \,...

 and the Hadamard variance.

For categorical variables, it is less common to measure dispersion by a single number. See qualitative variation
Qualitative variation
An index of qualitative variation is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature...

. One measure that does so is the discrete entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

.

Sources of statistical dispersion

In the physical sciences, such variability may result from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible
Accuracy and precision
In the fields of science, engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which...

, and there is additional inter-rater variability in interpreting and reporting the measured results. One may assume that the quantity being measured is stable, and that the variation between measurements is due to observational error
Observational error
Observational error is the difference between a measured value of quantity and its true value. In statistics, an error is not a "mistake". Variability is an inherent part of things being measured and of the measurement process.-Science and experiments:...

. A system of a large number of particles is characterized by the mean values of a relatively few number of macroscopic quantities such as temperature, energy, and density. The standard deviation is an important measure in Fluctuation theory, which explains many physical phenomena, including why the sky is blue .

In the biological sciences, the quantity being measured is seldom unchanging and stable, and the variation observed might additionally be intrinsic to the phenomenon: It may be due to inter-individual variability, that is, distinct members of a population differing from each other. Also, it may be due to intra-individual variability, that is, one and the same subject differing in tests taken at different times or in other differing conditions. Such types of variability are also seen in the arena of manufactured products; even there, the meticulous scientist finds variation.

In economics
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

, finance
Finance
"Finance" is often defined simply as the management of money or “funds” management Modern finance, however, is a family of business activity that includes the origination, marketing, and management of cash and money surrogates through a variety of capital accounts, instruments, and markets created...

, and other disciplines, regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

 attempts to explain the dispersion of a dependent variable, generally measured by its variance, using one or more independent variables each of which itself has positive dispersion. The fraction of variance explained is called the coefficient of determination
Coefficient of determination
In statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...

.

A partial ordering of dispersion

A mean-preserving spread
Mean-preserving spread
In probability and statistics, a mean-preserving spread is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean unchanged...

 (MPS) is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean (the expected value) unchanged. The concept of a mean-preserving spread provides a partial ordering of probability distributions according to their dispersions: of two probability distributions, one may be ranked as having more dispersion than the other, or alternatively neither may be ranked as having more dispersion.

See also

  • Average
    Average
    In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....

  • Summary statistics
    Summary statistics
    In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...

  • Qualitative variation
    Qualitative variation
    An index of qualitative variation is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature...

  • Robust measures of scale
    Robust measures of scale
    In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of quantitative data. Robust measures of scale are used to complement or replace conventional estimates of scale such as the sample variance or sample standard deviation...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK