Total correlation
Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 and in particular in information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, total correlation (Watanabe 1960) is one of several generalizations of the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

. It is also known as the multivariate constraint (Garner 1962) or multiinformation (Studený & Vejnarová 1999). It quantifies the redundancy or dependency among a set of n random variables.

Definition

For a given set of n random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

s , the total correlation is defined as the Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

 from the independent distribution of to the joint distribution ,
This divergence reduces to the simpler difference of entropies,

where is the information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

 of variable , and is the joint entropy of the variable set . In terms of the discrete probability distributions on variables , the total correlation is given by


The total correlation is the amount of information shared among the variables in the set. The sum represents the amount of information in bits
BITS
BITS or bits may refer to:* Plural of bit* Background Intelligent Transfer Service, a file transfer protocol* Birla Institute of Technology and Science, a technology school in Pilani, Rajasthan, India, with campuses in Goa, Hyderabad, and Dubai...

 (assuming base-2 logs) that the variables would possess if they were totally independent of one another (non-redundant), or, equivalently, the average code length to transmit the values of all variables if each variable was (optimally) coded independently. The term is the actual amount of information that the variable set contains, or equivalently, the average code length to transmit the values of all variables if the set of variables was (optimally) coded together. The difference between
these terms therefore represents the absolute redundancy (in bits) present in the given
set of variables, and thus provides a general quantitative measure of the
structure or organization embodied in the set of variables
(Rothstein 1952). The total correlation is also the Kullback–Leibler divergence between the actual distribution and its maximum entropy product approximation .

Total correlation tells us in the most general sense how cohesive or related are a group of variables. A near-zero total correlation indicates that the variables in the group are essentially statistically independent; they are completely unrelated, in the sense that knowing the value of one variable does not provide any clue as to the values of the other variables. On the other hand, the maximum total correlation, given by


occurs when one of the variables is completely redundant with all of the other variables. The variables are then maximally related in the sense that knowing the value of one variable provides complete information about the values of all the other variable, and the variables can be figuratively regarded as cogs, in which the position of one cog determines the positions of all the others (Rothstein 1952).

It is important to note that the total correlation counts up all the redundancies among a set of variables, but that these redundancies may be distributed throughout the variable set in a variety of complicated ways (Garner 1962). For example, some variables in the set may be totally inter-redundant while others in the set are completely independent. Perhaps more significantly, redundancy may be carried in interactions of various degrees: A group of variables may not possess any pairwise redundancies, but may possess higher-order interaction redundancies of the kind exemplified by the parity function. The decomposition of total correlation into its constituent redundancies is explored in a number sources (Mcgill 1954, Watanabe 1960, Garner 1962, Studeny & Vejnarova 1999, Jakulin & Bratko 2003a, Jakulin & Bratko 2003b, Nemenman 2004, Han 1978, Han 1980).

Conditional total correlation

Conditional total correlation is defined analogously to the total correlation, but adding a condition to each term. Conditional total correlation is similarly defined as a Kullback-Leibler divergence between two conditional probability distributions,
Analogous to the above, conditional total correlation reduces to a difference of conditional entropies,

Uses of total correlation

Clustering and feature selection
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...

algorithms based on total correlation have been explored by Watanabe. Alfonso et al. (2010) applied the concept of total correlation on the optimisation of water monitoring networks.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK