Coupling from the past
Encyclopedia
Among Markov chain Monte Carlo
(MCMC) algorithms, coupling from the past is a method for sampling
from the stationary distribution
of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.
aperiodic
Markov chain with state space and (unique) stationary distribution ( is a probability vector). Suppose that we come up with a probability distribution on the set of maps with the property that for every fixed , its image is distributed according to the transition probability of from state . An example of such a probability distribution is the one where is independent from whenever , but it is often worthwhile to consider other distributions. Now let for be independent samples from .
Suppose that is chosen randomly according to and is independent from the sequence . (We do not worry for now where this is coming from.) Then is also distributed according to , because is -stationary and our assumption on the law of . Define
Then it follows by induction that is also distributed according to for every . Now here is the main point. It may happen that for some the image of the map is a single element of .
In other words, for each . Therefore, we do not need to have access to in order to compute . The algorithm then involves finding some such that is a singleton, and outputing the element of that singleton. The design of a good distribution for which the task of finding such an and computing is not too costly is not always obvious, but has been accomplished successfully in several important instances.
for and a tool for determining if . (Here denotes cardinality
.) Suppose that is a partially ordered set
with order , which has a unique minimal element and a unique maximal element ; that is, every satisfies . Also, suppose that may be chosen to be supported on the set of monotone maps . Then it is easy to see that if and only if , since is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing for some constant , sampling the maps , and outputing if . If the algorithm proceeds by doubling and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps which were already sampled; it uses the previously sampled maps when needed.)
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
(MCMC) algorithms, coupling from the past is a method for sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
from the stationary distribution
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...
of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.
The basic idea
Consider a finite state irreducibleMarkov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...
aperiodic
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...
Markov chain with state space and (unique) stationary distribution ( is a probability vector). Suppose that we come up with a probability distribution on the set of maps with the property that for every fixed , its image is distributed according to the transition probability of from state . An example of such a probability distribution is the one where is independent from whenever , but it is often worthwhile to consider other distributions. Now let for be independent samples from .
Suppose that is chosen randomly according to and is independent from the sequence . (We do not worry for now where this is coming from.) Then is also distributed according to , because is -stationary and our assumption on the law of . Define
Then it follows by induction that is also distributed according to for every . Now here is the main point. It may happen that for some the image of the map is a single element of .
In other words, for each . Therefore, we do not need to have access to in order to compute . The algorithm then involves finding some such that is a singleton, and outputing the element of that singleton. The design of a good distribution for which the task of finding such an and computing is not too costly is not always obvious, but has been accomplished successfully in several important instances.
The monotone case
There is a special class of Markov chains in which there are particularly good choicesfor and a tool for determining if . (Here denotes cardinality
Cardinal number
In mathematics, cardinal numbers, or cardinals for short, are a generalization of the natural numbers used to measure the cardinality of sets. The cardinality of a finite set is a natural number – the number of elements in the set. The transfinite cardinal numbers describe the sizes of infinite...
.) Suppose that is a partially ordered set
Partially ordered set
In mathematics, especially order theory, a partially ordered set formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary relation that indicates that, for certain pairs of elements in the...
with order , which has a unique minimal element and a unique maximal element ; that is, every satisfies . Also, suppose that may be chosen to be supported on the set of monotone maps . Then it is easy to see that if and only if , since is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing for some constant , sampling the maps , and outputing if . If the algorithm proceeds by doubling and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps which were already sampled; it uses the previously sampled maps when needed.)