Rényi entropy
Encyclopedia
In information theory
, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. It is named after Alfréd Rényi
.
The Rényi entropy of order α, where α 0, α 1 is defined as
where pi are the probabilities of {x1, x2 ... xn} and is in base 2. If the probabilities are all the same then all the Rényi entropies of the distribution are equal, with Hα(X)=log n. Otherwise the entropies are weakly decreasing as a function of α.
Higher values of α, approaching infinity, give a Rényi entropy which is increasingly determined by consideration of only the highest probability events. Lower values of α, approaching zero, give a Rényi entropy which increasingly weights all possible events more equally, regardless of their probabilities. The intermediate case α=1 gives the Shannon entropy, which has special properties. When α=0, it is the maximum possible Shannon entropy, log(n).
The Rényi entropies are important in ecology and statistics as indices of diversity. The Rényi entropy
also important in quantum information
, it can be used as a measure of entanglement
. In Heisenberg spin chain the Rényi entropy was calculated explicitly in terms of modular function of α. They also lead to a spectrum of indices of fractal dimension
.
which is the logarithm of the cardinality of X, sometimes called the Hartley entropy of X.
In the limit that approaches 1, it can be shown using L'Hôpital's Rule
that converges to
which is the Shannon entropy.
Collision entropy, sometimes just called "Rényi entropy," refers to the case ,
where Y is a random variable independent of X but identically distributed to X. As , the limit exists as
and this is called Min-entropy
, because it is the smallest value of .
is because .
is because .
since according to Jensen's inequality
.
.
The Rényi divergence of order α, where α > 0, from a distribution P to a distribution Q is defined to be:
Like the Kullback-Leibler divergence, the Rényi divergences are non-negative for α>0. This divergence is also known as the alpha-divergence (-divergence).
Some special cases:
: minus the log probability under Q that pi>0;
: minus twice the logarithm of the Bhattacharyya coefficient;
: the Kullback-Leibler divergence;
: the log of the expected ratio of the probabilities;
: the log of the maximum ratio of the probabilities.
, is special because it is only when α=1 that one can separate out variables A and X from a joint probability distribution, and write:
for the absolute entropies, and
for the relative entropies.
The latter in particular means that if we seek a distribution p(x,a) which minimizes the divergence of some underlying prior measure m(x,a), and we acquire new information which only affects the distribution of a, then the distribution of p(x|a) remains m(x|a), unchanged.
The other Rényi divergences satisfy the criteria of being positive and continuous; being invariant under 1-to-1 co-ordinate transformations; and of combining additively when A and X are independent, so that if p(A,X) = p(A)p(X), then
and
The stronger properties of the α = 1 quantities, which allow the definition of conditional information and mutual information
from communication theory, may be very important in other applications, or entirely unimportant, depending on those applications' requirements.
admit simple expressions (Nielsen & Nock, 2011)
and
where
is a Jensen difference divergence.
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. It is named after Alfréd Rényi
Alfréd Rényi
Alfréd Rényi was a Hungarian mathematician who made contributions in combinatorics, graph theory, number theory but mostly in probability theory.-Life:...
.
The Rényi entropy of order α, where α 0, α 1 is defined as
where pi are the probabilities of {x1, x2 ... xn} and is in base 2. If the probabilities are all the same then all the Rényi entropies of the distribution are equal, with Hα(X)=log n. Otherwise the entropies are weakly decreasing as a function of α.
Higher values of α, approaching infinity, give a Rényi entropy which is increasingly determined by consideration of only the highest probability events. Lower values of α, approaching zero, give a Rényi entropy which increasingly weights all possible events more equally, regardless of their probabilities. The intermediate case α=1 gives the Shannon entropy, which has special properties. When α=0, it is the maximum possible Shannon entropy, log(n).
The Rényi entropies are important in ecology and statistics as indices of diversity. The Rényi entropy
also important in quantum information
Quantum information
In quantum mechanics, quantum information is physical information that is held in the "state" of a quantum system. The most popular unit of quantum information is the qubit, a two-level quantum system...
, it can be used as a measure of entanglement
Entanglement
Entanglement may refer to:* Quantum entanglement* Orientation entanglement* Entanglement * Entanglement of polymer chains, see Reptation* Wire entanglement...
. In Heisenberg spin chain the Rényi entropy was calculated explicitly in terms of modular function of α. They also lead to a spectrum of indices of fractal dimension
Fractal dimension
In fractal geometry, the fractal dimension, D, is a statistical quantity that gives an indication of how completely a fractal appears to fill space, as one zooms down to finer and finer scales. There are many specific definitions of fractal dimension. The most important theoretical fractal...
.
Hα for some particular values of α
Some particular cases:which is the logarithm of the cardinality of X, sometimes called the Hartley entropy of X.
In the limit that approaches 1, it can be shown using L'Hôpital's Rule
L'Hôpital's rule
In calculus, l'Hôpital's rule uses derivatives to help evaluate limits involving indeterminate forms. Application of the rule often converts an indeterminate form to a determinate form, allowing easy evaluation of the limit...
that converges to
which is the Shannon entropy.
Collision entropy, sometimes just called "Rényi entropy," refers to the case ,
where Y is a random variable independent of X but identically distributed to X. As , the limit exists as
and this is called Min-entropy
Min-entropy
In probability theory or information theory, the min-entropy of a discrete random event x with possible states 1... n and corresponding probabilities p1... pn is...
, because it is the smallest value of .
Inequalities between different values of α
The two latter cases are related by . On the other hand the Shannon entropy can be arbitrarily high for a random variable X with fixed min-entropy.is because .
is because .
since according to Jensen's inequality
Jensen's inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context,...
.
Rényi divergence
As well as the absolute Rényi entropies, Rényi also defined a spectrum of divergence measures generalising the Kullback–Leibler divergenceKullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...
.
The Rényi divergence of order α, where α > 0, from a distribution P to a distribution Q is defined to be:
Like the Kullback-Leibler divergence, the Rényi divergences are non-negative for α>0. This divergence is also known as the alpha-divergence (-divergence).
Some special cases:
: minus the log probability under Q that pi>0;
: minus twice the logarithm of the Bhattacharyya coefficient;
: the Kullback-Leibler divergence;
: the log of the expected ratio of the probabilities;
: the log of the maximum ratio of the probabilities.
Why α = 1 is special
The value α = 1, which gives the Shannon entropy and the Kullback–Leibler divergenceKullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...
, is special because it is only when α=1 that one can separate out variables A and X from a joint probability distribution, and write:
for the absolute entropies, and
for the relative entropies.
The latter in particular means that if we seek a distribution p(x,a) which minimizes the divergence of some underlying prior measure m(x,a), and we acquire new information which only affects the distribution of a, then the distribution of p(x|a) remains m(x|a), unchanged.
The other Rényi divergences satisfy the criteria of being positive and continuous; being invariant under 1-to-1 co-ordinate transformations; and of combining additively when A and X are independent, so that if p(A,X) = p(A)p(X), then
and
The stronger properties of the α = 1 quantities, which allow the definition of conditional information and mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...
from communication theory, may be very important in other applications, or entirely unimportant, depending on those applications' requirements.
Exponential families
The Rényi entropies and divergences for an exponential familyExponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
admit simple expressions (Nielsen & Nock, 2011)
and
where
is a Jensen difference divergence.
See also
- Diversity indices
- Tsallis entropyTsallis entropyIn physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. In the scientific literature, the physical relevance of the Tsallis entropy is highly debated...
- Generalized entropy indexGeneralized entropy indexThe generalized entropy index is a general formula for measuring redundancy in data. The redundancy can be viewed as inequality, lack of diversity, non-randomness, compressibility, or segregation in the data. The primary use is for income inequality...