Self-information
Encyclopedia
In information theory
, self-information is a measure of the information content associated with the outcome of a random variable
. It is expressed in a unit
of information
, for example bit
s,
nat
s,
or
hartley
s, depending on the base of the logarithm used in its calculation. The term self-information is also sometimes used as a synonym of entropy, i.e. the expected value of self-information in the first sense, because , where is the mutual information
of X with itself. These two meanings are not equivalent, and this article covers the first sense only. For the other sense, see entropy.
By definition, the amount of self-information contained in a probabilistic event
depends only on the probability
of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred.
Further, by definition, the measure
of self-information is positive and additive. If an event C is the intersection of two independent
events A and B, then the amount of information at the proclamation that C has happened, equals the sum of the amounts of information at proclamations of event A and event B respectively: I(A ∩ B)=I(A)+I(B).
Taking into account these properties, the self-information associated with outcome with probability is:
This definition complies with the above conditions.
In the above definition, the base of the logarithm is not specified: if using base 2, the unit of
is in bit
s.
When using the logarithm of base , the unit will be in
nat
.
For the log of base 10, the unit will be in hartley
.
As a quick illustration, the information content associated with an outcome of 4 heads (or any specific outcome) in 4 consecutive tosses of a coin would be 4 bits (probability 1/16), and the information content associated with getting a result other than the one specified would be 0.09 bits (probability 15/16). See below for detailed examples.
This measure has also been called surprisal, as it represents the "surprise
" of seeing the outcome (a highly improbable outcome is very surprising). This term was coined by Myron Tribus
in his 1961 book Thermostatics and Thermodynamics.
The information entropy
of a random event is the expected value
of its self-information.
Self-information is an example of a proper scoring rule
.
(or clustering) is the expectation
of the information of a test object. I.e. if we select an element at random and observe in which partition/cluster it exists, what quantity of information do we expect to obtain. The information of a Partitioning
with denoting the fraction of elements within partition is
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, self-information is a measure of the information content associated with the outcome of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
. It is expressed in a unit
Units of measurement
A unit of measurement is a definite magnitude of a physical quantity, defined and adopted by convention and/or by law, that is used as a standard for measurement of the same physical quantity. Any other value of the physical quantity can be expressed as a simple multiple of the unit of...
of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
, for example bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s,
nat
Nat (information)
A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...
s,
or
hartley
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...
s, depending on the base of the logarithm used in its calculation. The term self-information is also sometimes used as a synonym of entropy, i.e. the expected value of self-information in the first sense, because , where is the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...
of X with itself. These two meanings are not equivalent, and this article covers the first sense only. For the other sense, see entropy.
By definition, the amount of self-information contained in a probabilistic event
Event (probability theory)
In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
depends only on the probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred.
Further, by definition, the measure
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...
of self-information is positive and additive. If an event C is the intersection of two independent
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
events A and B, then the amount of information at the proclamation that C has happened, equals the sum of the amounts of information at proclamations of event A and event B respectively: I(A ∩ B)=I(A)+I(B).
Taking into account these properties, the self-information associated with outcome with probability is:
This definition complies with the above conditions.
In the above definition, the base of the logarithm is not specified: if using base 2, the unit of
is in bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s.
When using the logarithm of base , the unit will be in
nat
Nat (information)
A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...
.
For the log of base 10, the unit will be in hartley
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...
.
As a quick illustration, the information content associated with an outcome of 4 heads (or any specific outcome) in 4 consecutive tosses of a coin would be 4 bits (probability 1/16), and the information content associated with getting a result other than the one specified would be 0.09 bits (probability 15/16). See below for detailed examples.
This measure has also been called surprisal, as it represents the "surprise
Surprise (emotion)
Surprise is a brief emotional state experienced as the result of an unexpected event. Surprise can have any valence; that is, it can be neutral/moderate, pleasant, or unpleasant. If a person experiences a very powerful or long lasting surprise, it may be considered shock.-Reality...
" of seeing the outcome (a highly improbable outcome is very surprising). This term was coined by Myron Tribus
Myron Tribus
Myron T. Tribus was the director of the Center for Advanced Engineering Study at MIT. He headed the center when it published W. Edwards Deming's book, Out of the Crisis, and became a leading supporter and interpreter of W. Edwards Deming...
in his 1961 book Thermostatics and Thermodynamics.
The information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...
of a random event is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of its self-information.
Self-information is an example of a proper scoring rule
Scoring rule
In decision theory a score function, or scoring rule, is a measure of the performance of an entity, be it person or machine, that repeatedly makes decisions under uncertainty. For example, every evening a TV weather forecaster may give the probability of rain on the next day, in a type of...
.
Examples
- On tossing a coinCoin flippingCoin flipping or coin tossing or heads or tails is the practice of throwing a coin in the air to choose between two alternatives, sometimes to resolve a dispute between two parties...
, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts to
- I('tail') = log2 (1/0.5) = log2 2 = 1 bits of information.
- When throwing a fair diceDiceA die is a small throwable object with multiple resting positions, used for generating random numbers...
, the probability of 'four' is 1/6. When it is proclaimed that 'four' has been thrown, the amount of self-information is
- When throwing a fair dice
- I('four') = log2 (1/(1/6)) = log2 (6) = 2.585 bits.
- When, independently, two dice are thrown, the amount of information associated with {throw 1 = 'two' & throw 2 = 'four'} equals
- I('throw 1 is two & throw 2 is four') = log2 (1/P(throw 1 = 'two' & throw 2 = 'four')) = log2 (1/(1/36)) = log2 (36) = 5.170 bits.
This outcome equals the sum of the individual amounts of self-information associated with {throw 1 = 'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.- In the same two dice situation we can also consider the information present in the statement "The sum of the two dice is five"
- I('The sum of throws 1 and 2 is five') = log2 (1/P('throw 1 and 2 sum to five')) = log2 (1/(4/36)) = 3.17 bits. The (4/36) is because there are four ways out of 36 possible to sum two dice to 5. This shows how more complex or ambiguous events can still carry information.
Self-Information of a Partitioning
The self-information of a partitioning of elements within a setPartition of a set
In mathematics, a partition of a set X is a division of X into non-overlapping and non-empty "parts" or "blocks" or "cells" that cover all of X...
(or clustering) is the expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the information of a test object. I.e. if we select an element at random and observe in which partition/cluster it exists, what quantity of information do we expect to obtain. The information of a Partitioning
Partition of a set
In mathematics, a partition of a set X is a division of X into non-overlapping and non-empty "parts" or "blocks" or "cells" that cover all of X...
with denoting the fraction of elements within partition is