Markov chain Monte Carlo
Encyclopedia
Markov chain
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

Monte Carlo
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

(MCMC) methods (which include random walk
Random walk
A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...

Monte Carlo
methods) are a class of algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s for sampling from probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s based on constructing a Markov chain
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps.

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing—the stationary distribution is reached quickly starting from an arbitrary position—described further under Markov chain mixing time
Markov chain mixing time
In probability theory, the mixing time of a Markov chain is the time until the Markov chain is "close" to its steady state distribution.More precisely, a fundamental result about Markov chains is that a finite state irreducible aperiodic chain has a unique stationary distribution π and,...

.

Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past
Coupling from the past
Among Markov chain Monte Carlo algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution...

can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation) running time
Running Time
Running Time may refer to:* Running Time * see Analysis of algorithms...

.

The most common application of these algorithms is numerically calculating multi-dimensional integral
Integral
Integration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

s. In these methods, an ensemble of "walkers" moves around randomly. At each point where the walker steps, the integrand value at that point is counted towards the integral. The walker then may make a number of tentative steps around the area, looking for a place with reasonably high contribution to the integral to move into next. Random walk methods are a kind of random simulation or Monte Carlo method
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are correlated
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

. A Markov chain
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

is constructed in such a way as to have the integrand as its equilibrium distribution. Surprisingly, this is often easy to do.

Multi-dimensional integrals often arise in Bayesian statistics
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

, computational physics
Computational physics
Computational physics is the study and implementation of numerical algorithms to solve problems in physics for which a quantitative theory already exists...

, computational biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

and computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

, so Markov chain Monte Carlo methods are widely used in those fields. For example, see Gill and Robert & Casella.

## Random walk algorithms

Many Markov chain Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. Here are some random walk MCMC methods:
• Metropolis–Hastings algorithm: Generates a random walk
Random walk
A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...

using a proposal density and a method for rejecting proposed moves.
• Gibbs sampling
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

: Requires that all the conditional distribution
Conditional distribution
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...

s of the target distribution can be sampled exactly. Popular partly because when this is so, the method does not require any 'tuning'.
• Slice sampling
Slice sampling
Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution...

: Depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. This method alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
• Multiple-try Metropolis
Multiple-try Metropolis
In Markov chain Monte Carlo, the Metropolis–Hastings algorithm can be used to sample from a probability distribution which is difficult to sample from directly. However, the MH algorithm requires the user to supply a proposal distribution, which can be relatively arbitrary...

: A variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. This allows the algorithm to generally take larger steps at each iteration, which helps combat problems intrinsic to large dimensional problems.

## Avoiding random walks

More sophisticated algorithms use some method of preventing the walker from doubling back. These algorithms may be harder to implement, but may exhibit faster convergence (i.e. fewer steps for an accurate result).
• Successive over-relaxation
Successive over-relaxation
In numerical linear algebra, the method of successive over-relaxation is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.It was devised simultaneously by David...

: A Monte Carlo version of this technique can be seen as a variation on Gibbs sampling
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

; it sometimes avoids random walks.
• Hybrid Monte Carlo
Hybrid Monte Carlo
In mathematics and physics, the Hybrid Monte Carlo algorithm, also known as Hamiltonian Monte Carlo, is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult...

(HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentum
Momentum
In classical mechanics, linear momentum or translational momentum is the product of the mass and velocity of an object...

vector and implementing Hamiltonian dynamics where the potential function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid MCMC is that proposals move across the sample space in larger steps and are therefore less correlated and converge to the target distribution more rapidly.
• Some variations on slice sampling also avoid random walks.
• Langevin MCMC and other methods that rely on the gradient (and possibly second derivative) of the log posterior avoid random walks by making proposals that are more likely to be in the direction of higher probability density.

## Changing dimension

The reversible-jump method is a variant of Metropolis–Hastings that allows proposals that change the dimensionality of the space. This method was proposed in 1995 by Peter Green
Peter Green (statistician)
Peter Green FRS is a British Bayesian statistician. He holds a chair in statistics at Bristol University. He is distinguished for his contributions to computational statistics, in particular his contributions to spatial statistics and semi-parametric regression models and also his development of...

of Bristol University. Markov chain Monte Carlo methods that change dimensionality have also long been used in statistical physics
Statistical physics
Statistical physics is the branch of physics that uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approximations, in solving physical problems. It can describe a wide variety of fields with an inherently stochastic...

applications, where for some problems a distribution that is a grand canonical ensemble
Grand canonical ensemble
In statistical mechanics, a grand canonical ensemble is a theoretical collection of model systems put together to mirror the calculated probability distribution of microscopic states of a given physical system which is being maintained in a given macroscopic state...

is used (e.g., when the number of molecules in a box is variable).

• Diaconis, Persi
Persi Diaconis
Persi Warren Diaconis is an American mathematician and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University....

, "The Markov chain Monte Carlo revolution", Bull. Amer. Math. Soc. (2009)
• Richey, Matthew
Matthew Richey
Matthew Richey, came to Canada from County Donegal in the Republic of Ireland. He was a Wesleyan Methodist minister, an educator, and an important leader in the Methodist community in Nova Scotia. He earned an M.A...

, "The Evolution of Markov Chain Monte Carlo Methods", The American Mathematical Monthly, May 2010, 383-413