Coefficient of dispersion
Encyclopedia
In probability theory
and statistics
, the index of dispersion, dispersion index, coefficient of dispersion, or variance-to-mean ratio (VMR), like the coefficient of variation
, is a normalized
measure of the dispersion
of a probability distribution
: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.
It is defined as the ratio of the variance
σ2 to the mean
μ,
It is also known as the Fano factor, though this term is sometimes reserved for windowed data (the mean and variance are computed over a subpopulation), where the index of dispersion is the special case where the window is infinite. Windowing data is frequently done: the VMR is frequently computed over various intervals in time or small regions in space, which may be called "windows", and the resulting statistic called the Fano factor.
It is only defined when the mean μ is non-zero, and is generally only used for positive statistics, such as count data or time between events, or where the underlying distribution is assumed to be the exponential distribution
or Poisson distribution
.
The above defines a dispersion index for counts. A different definition applies for a dispersion index for intervals, where the quantities treated are the lengths of the time-intervals between the events, and where the index is equivalent to the square of the coefficient of variation
of the interval lengths. Common usage is that "index of dispersion" means the dispersion index for counts.
has equal variance and mean, giving it a VMR = 1. The geometric distribution and the negative binomial distribution
have VMR > 1, while the binomial distribution has VMR < 1, and the constant random variable has VMR = 0. This yields the following table:
This can be considered analogous to the classification of conic sections by eccentricity
; see Cumulants of particular probability distributions for details.
When the coefficient of dispersion is less than 1, a dataset is said to be "under-dispersed": this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process. For instance, points spread uniformly in space or regular, periodic events will be under-dispersed.
If the index of dispersion is larger than 1, a dataset is said to be over-dispersed
: this can correspond to the existence of clusters of occurrences. Clumped, concentrated data is over-dispersed.
In terms of the interval-counts, over-dispersion corresponds to there being more intervals with low counts and more intervals with high counts, compared to a Poisson distribution: in contrast, under-dispersion is characterised by there being more intervals having counts close to the mean count, compared to a Poisson distribution.
The relevance of the index of dispersion is that it has a value of one when the probability distribution of the number of occurrences in an interval is a Poisson distribution
. Thus the measure can be used to assess whether observed data can be modeled using a Poisson process
.
A sample-based estimate of the dispersion index can be used to construct a formal statistical hypothesis test
for the adequacy of the model that a series of counts follow a Poisson distribution.
The VMR is a good measure of the degree of randomness of a given phenomenon. This technique is also commonly used in currency management.
), the distribution of the number of particle inside a given volume is poissonian, i.e. VMR=1. Therefore, to assess if a given spatial pattern (assuming you have a way to measure it) is due purely to diffusion or if some particle-particle interaction is involved : divide the space into patches, Quadrats or Sample Units (SU), count the number of individuals in each patch or SU, and compute the VMR. VMRs significantly higher than 1 denote a clustered distribution, where random walk is not enough to smother the attractive inter-particle potential.
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the index of dispersion, dispersion index, coefficient of dispersion, or variance-to-mean ratio (VMR), like the coefficient of variation
Coefficient of variation
In probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...
, is a normalized
Normalization (statistics)
In one usage in statistics, normalization is the process of isolating statistical error in repeated measured data. A normalization is sometimes based on a property...
measure of the dispersion
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
of a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.
It is defined as the ratio of the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
σ2 to the mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
μ,
It is also known as the Fano factor, though this term is sometimes reserved for windowed data (the mean and variance are computed over a subpopulation), where the index of dispersion is the special case where the window is infinite. Windowing data is frequently done: the VMR is frequently computed over various intervals in time or small regions in space, which may be called "windows", and the resulting statistic called the Fano factor.
It is only defined when the mean μ is non-zero, and is generally only used for positive statistics, such as count data or time between events, or where the underlying distribution is assumed to be the exponential distribution
Exponential distribution
In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...
or Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
.
Terminology
In this context, the observed dataset may consist of the times of occurrence of predefined events, such as earthquakes in a given region over a given magnitude, or of the locations in geographical space of plants of a given species. Details of such occurrences are first converted into counts of the numbers of events or occurrences in each of a set of equal-sized time- or space-regions.The above defines a dispersion index for counts. A different definition applies for a dispersion index for intervals, where the quantities treated are the lengths of the time-intervals between the events, and where the index is equivalent to the square of the coefficient of variation
Coefficient of variation
In probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...
of the interval lengths. Common usage is that "index of dispersion" means the dispersion index for counts.
Interpretation
The Poisson distributionPoisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
has equal variance and mean, giving it a VMR = 1. The geometric distribution and the negative binomial distribution
Negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
have VMR > 1, while the binomial distribution has VMR < 1, and the constant random variable has VMR = 0. This yields the following table:
Distribution | VMR | |
---|---|---|
constant random variable | VMR = 0 | not dispersed |
binomial distribution | 0 < VMR < 1 | under-dispersed |
Poisson distribution Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since... |
VMR = 1 | |
negative binomial distribution Negative binomial distribution In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur... |
VMR > 1 | over-dispersed |
This can be considered analogous to the classification of conic sections by eccentricity
Eccentricity (mathematics)
In mathematics, the eccentricity, denoted e or \varepsilon, is a parameter associated with every conic section. It can be thought of as a measure of how much the conic section deviates from being circular.In particular,...
; see Cumulants of particular probability distributions for details.
When the coefficient of dispersion is less than 1, a dataset is said to be "under-dispersed": this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process. For instance, points spread uniformly in space or regular, periodic events will be under-dispersed.
If the index of dispersion is larger than 1, a dataset is said to be over-dispersed
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
: this can correspond to the existence of clusters of occurrences. Clumped, concentrated data is over-dispersed.
In terms of the interval-counts, over-dispersion corresponds to there being more intervals with low counts and more intervals with high counts, compared to a Poisson distribution: in contrast, under-dispersion is characterised by there being more intervals having counts close to the mean count, compared to a Poisson distribution.
The relevance of the index of dispersion is that it has a value of one when the probability distribution of the number of occurrences in an interval is a Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
. Thus the measure can be used to assess whether observed data can be modeled using a Poisson process
Poisson process
A Poisson process, named after the French mathematician Siméon-Denis Poisson , is a stochastic process in which events occur continuously and independently of one another...
.
A sample-based estimate of the dispersion index can be used to construct a formal statistical hypothesis test
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
for the adequacy of the model that a series of counts follow a Poisson distribution.
The VMR is a good measure of the degree of randomness of a given phenomenon. This technique is also commonly used in currency management.
Example
For randomly diffusing particles (Brownian motionBrownian motion
Brownian motion or pedesis is the presumably random drifting of particles suspended in a fluid or the mathematical model used to describe such random movements, which is often called a particle theory.The mathematical model of Brownian motion has several real-world applications...
), the distribution of the number of particle inside a given volume is poissonian, i.e. VMR=1. Therefore, to assess if a given spatial pattern (assuming you have a way to measure it) is due purely to diffusion or if some particle-particle interaction is involved : divide the space into patches, Quadrats or Sample Units (SU), count the number of individuals in each patch or SU, and compute the VMR. VMRs significantly higher than 1 denote a clustered distribution, where random walk is not enough to smother the attractive inter-particle potential.
Similar ratios
- Coefficient of variationCoefficient of variationIn probability theory and statistics, the coefficient of variation is a normalized measure of dispersion of a probability distribution. It is also known as unitized risk or the variation coefficient. The absolute value of the CV is sometimes known as relative standard deviation , which is...
, - Standardized moment,
- Fano factor, (windowed VMR)
- Signal to noise ratio, (in signal processingSignal processingSignal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...
)- Signal to noise ratio (image processing)Signal to noise ratio (image processing)The Signal to Noise Ratio is used in imaging as a physical measure of the sensitivity of a imaging system. Industry standards measure SNR in decibels of power and therefore apply the 20 log rule to the "pure" SNR ratio...
- Signal to noise ratio (image processing)