Divergence (statistics)
Encyclopedia
In statistics
and information geometry
, divergence or a contrast function is a function which establishes the “distance” of one probability distribution
to the other on a statistical manifold
. The divergence is a weaker notion than that of the distance
in mathematics, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality
.
s with common support. Then a divergence on S is a function satisfying
The dual divergence D* is defined as
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
and information geometry
Information geometry
Information geometry is a branch of mathematics that applies the techniques of differential geometry to the field of probability theory. It derives its name from the fact that the Fisher information is used as the Riemannian metric when considering the geometry of probability distribution families...
, divergence or a contrast function is a function which establishes the “distance” of one probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
to the other on a statistical manifold
Manifold
In mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....
. The divergence is a weaker notion than that of the distance
Distance
Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...
in mathematics, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality
Triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....
.
Definition
Suppose S is a space of all probability distributionProbability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s with common support. Then a divergence on S is a function satisfying
- D(p || q) ≥ 0 for all p, q ∈ S,
- D(p || q) = 0 if and only if p = q,
- The matrix g(D) (see definition in the “geometrical properties” section) is strictly positive-definite everywhere on S.
The dual divergence D* is defined as
-
Geometrical properties
Many properties of divergences can be derived if we restrict S to be a statistical manifoldManifoldIn mathematics , a manifold is a topological space that on a small enough scale resembles the Euclidean space of a specific dimension, called the dimension of the manifold....
, meaning that it can be parametrized with a finite-dimensional coordinate system θ, so that for a distribution we can write .
For a pair of points with coordinates θp and θq, denote the partial derivatives of D(p || q) as-
Now we restrict these functions to a diagonal , and denote-
By definition, the function D(p || q) is minimized at , and therefore-
where matrix g(D) is positive semi-definite and defines a unique Riemannian metric on the manifold S.
Divergence D(· || ·) also defines a unique torsion-free affine connectionAffine connectionIn the branch of mathematics called differential geometry, an affine connection is a geometrical object on a smooth manifold which connects nearby tangent spaces, and so permits tangent vector fields to be differentiated as if they were functions on the manifold with values in a fixed vector space...
∇(D) with coefficients-
and the dual to this connection ∇* is generated by the dual divergence D*.
Thus, a divergence D(· || ·) generates on a statistical manifold a unique dualistic structure (g(D), ∇(D), ∇(D*)). The converse is also true: every torsion-free dualistic structure on a statistical manifold is induced from some globally defined divergence function (which however need not be unique).
For example, when D is an f-divergenceF-divergenceIn probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...
for some function ƒ(·), then it generates the metric and the connection , where g is the canonical Fisher information metricFisher information metricIn information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space....
, ∇(α) is the α-connection, , and .
Examples
The largest and most frequently used class of divergences form the so-called f-divergenceF-divergenceIn probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...
s, however other types of divergence functions are also encountered in the literature.
f-divergences
This family of divergences are generated through functions f(u), convex on and such that . Then an f-divergence is defined as-
Kullback-Leibler divergence: squared Hellinger distance Hellinger distanceIn probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...
:Jeffrey’s divergence: Chernoff’s α-divergence: exponential divergence: Kagan’s divergence: (α,β)-product divergence:
-
-
-
-
-