Markov chain Monte Carlo

Encyclopedia

**Markov chain**

Monte Carlo

(

Markov chain

A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

Monte Carlo

Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

**MCMC**) methods (which include

**random walk**

Monte Carlomethods) are a class of algorithm

Random walk

A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...

Monte Carlo

Algorithm

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s for sampling from probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s based on constructing a Markov chain

Markov chain

A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps.

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have

*rapid mixing*—the stationary distribution is reached quickly starting from an arbitrary position—described further under Markov chain mixing time

Markov chain mixing time

In probability theory, the mixing time of a Markov chain is the time until the Markov chain is "close" to its steady state distribution.More precisely, a fundamental result about Markov chains is that a finite state irreducible aperiodic chain has a unique stationary distribution π and,...

.

Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past

Coupling from the past

Among Markov chain Monte Carlo algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution...

can produce exact samples, at the cost of additional computation and an unbounded (though finite in expectation) running time

Running Time

Running Time may refer to:* Running Time * see Analysis of algorithms...

.

The most common application of these algorithms is numerically calculating multi-dimensional integral

Integral

Integration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...

s. In these methods, an ensemble of "walkers" moves around randomly. At each point where the walker steps, the integrand value at that point is counted towards the integral. The walker then may make a number of tentative steps around the area, looking for a place with reasonably high contribution to the integral to move into next. Random walk methods are a kind of random simulation or Monte Carlo method

Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are

*correlated*

. A Markov chain

Correlation

In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

Markov chain

A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

is constructed in such a way as to have the integrand as its equilibrium distribution. Surprisingly, this is often easy to do.

Multi-dimensional integrals often arise in Bayesian statistics

Bayesian statistics

Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

, computational physics

Computational physics

Computational physics is the study and implementation of numerical algorithms to solve problems in physics for which a quantitative theory already exists...

, computational biology

Computational biology

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

and computational linguistics

Computational linguistics

Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

, so Markov chain Monte Carlo methods are widely used in those fields. For example, see Gill and Robert & Casella.

## Random walk algorithms

Many Markov chain Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. Here are some random walk MCMC methods:- Metropolis–Hastings algorithm: Generates a random walkRandom walkA random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that consists of taking successive random steps. For example, the path traced by a molecule as it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock and the...

using a proposal density and a method for rejecting proposed moves. - Gibbs samplingGibbs samplingIn statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

: Requires that all the conditional distributionConditional distributionGiven two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...

s of the target distribution can be sampled exactly. Popular partly because when this is so, the method does not require any 'tuning'. - Slice samplingSlice samplingSlice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution...

: Depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. This method alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position. - Multiple-try MetropolisMultiple-try MetropolisIn Markov chain Monte Carlo, the Metropolis–Hastings algorithm can be used to sample from a probability distribution which is difficult to sample from directly. However, the MH algorithm requires the user to supply a proposal distribution, which can be relatively arbitrary...

: A variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. This allows the algorithm to generally take larger steps at each iteration, which helps combat problems intrinsic to large dimensional problems.

## Avoiding random walks

More sophisticated algorithms use some method of preventing the walker from doubling back. These algorithms may be harder to implement, but may exhibit faster convergence (i.e. fewer steps for an accurate result).- Successive over-relaxationSuccessive over-relaxationIn numerical linear algebra, the method of successive over-relaxation is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.It was devised simultaneously by David...

: A Monte Carlo version of this technique can be seen as a variation on Gibbs samplingGibbs samplingIn statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

; it sometimes avoids random walks. - Hybrid Monte CarloHybrid Monte CarloIn mathematics and physics, the Hybrid Monte Carlo algorithm, also known as Hamiltonian Monte Carlo, is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult...

(HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentumMomentumIn classical mechanics, linear momentum or translational momentum is the product of the mass and velocity of an object...

vector and implementing Hamiltonian dynamics where the potential function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid MCMC is that proposals move across the sample space in larger steps and are therefore less correlated and converge to the target distribution more rapidly. - Some variations on slice sampling also avoid random walks.
- Langevin MCMC and other methods that rely on the gradient (and possibly second derivative) of the log posterior avoid random walks by making proposals that are more likely to be in the direction of higher probability density.

## Changing dimension

The reversible-jump method is a variant of Metropolis–Hastings that allows proposals that change the dimensionality of the space. This method was proposed in 1995 by Peter GreenPeter Green (statistician)

Peter Green FRS is a British Bayesian statistician. He holds a chair in statistics at Bristol University. He is distinguished for his contributions to computational statistics, in particular his contributions to spatial statistics and semi-parametric regression models and also his development of...

of Bristol University. Markov chain Monte Carlo methods that change dimensionality have also long been used in statistical physics

Statistical physics

Statistical physics is the branch of physics that uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approximations, in solving physical problems. It can describe a wide variety of fields with an inherently stochastic...

applications, where for some problems a distribution that is a grand canonical ensemble

Grand canonical ensemble

In statistical mechanics, a grand canonical ensemble is a theoretical collection of model systems put together to mirror the calculated probability distribution of microscopic states of a given physical system which is being maintained in a given macroscopic state...

is used (e.g., when the number of molecules in a box is variable).

## Further reading

- Diaconis, PersiPersi DiaconisPersi Warren Diaconis is an American mathematician and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University....

, "The Markov chain Monte Carlo revolution", Bull. Amer. Math. Soc. (2009) - Richey, MatthewMatthew RicheyMatthew Richey, came to Canada from County Donegal in the Republic of Ireland. He was a Wesleyan Methodist minister, an educator, and an important leader in the Methodist community in Nova Scotia. He earned an M.A...

, "The Evolution of Markov Chain Monte Carlo Methods", The American Mathematical Monthly, May 2010, 383-413

## External links

- MCMC sampling and other methods in a basic overview, by Alexander Mantzaris
- Visual demonstration of MCMC sampling methods (Java applet), by Laird Breyer
- A Toy Example of MCMC sampling, by Zhiyuan Weng