Qualitative variation
Encyclopedia
An index of qualitative variation (IQV) is a measure of statistical dispersion
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

 in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while the most sophisticated is the information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

.

Properties

There are various indices of qualitative variation; a number are summarized and devised by Wilcox , , who requires the following standardization properties to be satisfied:
  • Variation varies between 0 and 1.
  • Variation is 0 if and only if all cases belong to a single category.
  • Variation is 1 if and only if cases are evenly divided across all category.


In particular, the value of these standardized indices does not depend on the number of categories or number of samples.

For any index, the closer to uniform the distribution, the larger the variance, and the larger the differences in frequencies across categories, the smaller the variance.

Indices of qualitative variation are in this sense complementary to information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

, which is maximized when all cases belong to a single category and minimized in a uniform distribution, but they are not complementary in the sense of a particular IQV equaling 1 minus entropy. Indeed, information entropy can be used as an index of qualitative variation.

One characterization of a particular index of qualitative variation (IQV) is as a ratio of observed differences to maximum differences.

Formulae

Wilcox gives a number of formulae for various indices of QV , the first, which he designates DM for "Deviation from the Mode", is a standardized form of the variation ratio, and is analogous to variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 as deviation from the mean.

One formula for IQV, given as M2 in is:
where K is the number of categories, and is the proportion of observations that fall in a given category i. The factor of is for standardization.

The unstandardized index, , denoted as M1 , can be interpreted as the likelihood that a random pair of samples will belong to the same category , so this formula for IQV is a standardized likelihood of a random pair falling in the same category. M1 and M2 can be interpreted in terms of variance of a multinomial distribution  (there called an "expanded binomial model").

Evaluation of indices

Different indices give different values of variation, and may be used for different purposes: several are used and critiqued in the sociology literature especially.

If one wishes to simply make ordinal comparisons between samples (is one sample more or less varied than another), the choice of IQV is relatively less important, as they will often give the same ordering.

In some cases it is useful to not standardize an index to run from 0 to 1, regardless of number of categories or samples , but one generally so standardizes it.

See also

  • Statistical dispersion
    Statistical dispersion
    In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

  • Diversity index
    Diversity index
    A diversity index is a statistic which is intended to measure the local members of a set consisting of various types of objects. Diversity indices can be used in many fields of study to assess the diversity of any population in which each member belongs to a unique group, type or species...

  • Variation ratio
  • Information entropy
    Information entropy
    In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK