Multivariate Polya distribution
Encyclopedia
The multivariate Pólya
distribution, named after George Pólya
, also called the Dirichlet compound multinomial distribution, is a compound probability distribution
, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and a set of discrete samples is drawn from the categorical distribution
with probability vector p. The compounding corresponds to a Polya urn scheme. In document classification, for example, the distribution is used to represent probabilities over word counts for different document types.
which results in the following explicit formula:
where is the gamma function
, with
.
where is the beta function.
, economy
, combat modeling, and quantitative marketing.
George Pólya
George Pólya was a Hungarian mathematician. He was a professor of mathematics from 1914 to 1940 at ETH Zürich and from 1940 to 1953 at Stanford University. He made fundamental contributions to combinatorics, number theory, numerical analysis and probability theory...
distribution, named after George Pólya
George Pólya
George Pólya was a Hungarian mathematician. He was a professor of mathematics from 1914 to 1940 at ETH Zürich and from 1940 to 1953 at Stanford University. He made fundamental contributions to combinatorics, number theory, numerical analysis and probability theory...
, also called the Dirichlet compound multinomial distribution, is a compound probability distribution
Compound probability distribution
In probability theory, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution F with an unknown parameter θ that is distributed according to some other distribution G, and then...
, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and a set of discrete samples is drawn from the categorical distribution
Categorical distribution
In probability theory and statistics, a categorical distribution is a probability distribution that describes the result of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified...
with probability vector p. The compounding corresponds to a Polya urn scheme. In document classification, for example, the distribution is used to represent probabilities over word counts for different document types.
Probability mass function
We are doing N independent draws from a categorical distribution with K categories. Let x=(n1,n2,...,nK) denote the vector of counts, where nk is the number of times category k was drawn. If the parameter of the categorical distribution is given as p=(p1,p2,...,pK), where is the probability to draw value k, the probability distribution for counts, P(x|p) is given by the associated multinomial distribution with parameter p. But now p is not given, but instead considered drawn from a Dirichlet distribution with parameter vector . The resulting compound distribution is obtained by integrating out p:which results in the following explicit formula:
where is the gamma function
Gamma function
In mathematics, the gamma function is an extension of the factorial function, with its argument shifted down by 1, to real and complex numbers...
, with
.
Another form
The probability mass function may be written more compactly in terms of the beta function, as follows:where is the beta function.
Related distributions
The one-dimensional version of the multivariate Pólya distribution is known as the Beta-binomial distribution.Uses
The multivariate Pólya distribution is used in automated document classification and clustering, geneticsGenetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
, economy
Economy
An economy consists of the economic system of a country or other area; the labor, capital and land resources; and the manufacturing, trade, distribution, and consumption of goods and services of that area...
, combat modeling, and quantitative marketing.
See also
- Beta-binomial distribution
- Chinese restaurant process
- Dirichlet processDirichlet processIn probability theory, a Dirichlet process is a stochastic process that can be thought of as a probability distribution whose domain is itself a random distribution...
- Generalized Dirichlet distributionGeneralized Dirichlet distributionIn statistics, the generalized Dirichlet distribution is a generalization of the Dirichlet distribution with a more general covariance structure and twice the number of parameters...