Cross-entropy method - AbsoluteAstronomy.com

The cross-entropy method attributed to Reuven Rubinstein is a general Monte Carlo

Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

approach to
combinatorial

Combinatorial optimization

In applied mathematics and theoretical computer science, combinatorial optimization is a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not feasible...

and continuous

Continuous optimization

Continuous optimization is a branch of optimization in applied mathematics.As opposed to discrete optimization, the variables used in the objective function can assume real values, e.g., values from intervals of the real line....

multi-extremal optimization

Optimization (mathematics)

In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

and importance sampling

Importance sampling

In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution rather than the distribution of interest. It is related to Umbrella sampling in computational physics...

.
The method originated from the field of rare event simulation, where
very small probabilities need to be accurately estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.
The CE method can be applied to static and noisy combinatorial optimization problems such as the traveling salesman problem, the quadratic assignment problem

Quadratic assignment problem

The quadratic assignment problem is one of fundamental combinatorial optimization problems in the branch of optimization or operations research in mathematics, from the category of the facilities location problems....

, DNA sequence alignment

Sequence alignment

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

, the max-cut problem and the buffer allocation problem, as well as continuous global optimization

Global optimization

Global optimization is a branch of applied mathematics and numerical analysis that deals with the optimization of a function or a set of functions to some criteria.- General :The most common form is the minimization of one real-valued function...

problems with many local extrema.

In a nutshell the CE method consists of two phases:

Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.
Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration. This step involves minimizing the cross-entropy
Cross entropy
In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p.The cross entropy...

or Kullback-Leibler divergence.

Estimation via importance sampling

Consider the general problem of estimating the quantity

, where

is some performance function and

is a member of some parametric family

Parametric family

In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....

of distributions. Using importance sampling

Importance sampling

this quantity can be estimated as

, where

is a random sample from

. For positive

, the theoretically optimal importance sampling density

Probability density function

In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

(pdf)is given by

. This, however, depends on the unknown

. The CE method aims to approximate the optimal pdf by adaptively selecting members of the parametric family that are closest (in the Kullback-Leibler sense) to the optimal pdf

Generic CE algorithm

Choose initial parameter vector ; set t = 1.
Generate a random sample from
Solve for , where
If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2.

In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are

When belongs to the natural exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
When is discrete
Discrete space
In topology, a discrete space is a particularly simple example of a topological space or similar structure, one in which the points are "isolated" from each other in a certain sense.- Definitions :Given a set X:...

with finite support
Support (mathematics)
In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set . This concept is used very widely in mathematical analysis...
When and , then corresponds to the maximum likelihood estimator
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

based on those .

Continuous optimization—example

The same CE algorithm can be used for optimization, rather than estimation.
Suppose the problem is to maximize some function

, for example,

.
To apply CE, one considers first the associated stochastic problem of estimating

for a given level

, and parametric family

, for example the 1-dimensional
Gaussian distribution,
parameterized by its mean

and variance

(so

here).
Hence, for a given

, the goal is to find

so that

is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above.
It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and
parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value

.
The worst of the elite samples is then used as the level parameter for the next iteration.
This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm.

Pseudo-code

1. mu:=-6; sigma2:=100; t:=0; maxits=100; // Initialize parameters
2. N:=100; Ne:=10; //
3. while t < maxits and sigma2 > epsilon // While not converged and maxits not exceeded
4. X = SampleGaussian(mu,sigma2,N); // Obtain N samples from current sampling distribution
5. S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2); // Evaluate objective function at sampled points
6. X = sort(X,S); // Sort X by objective function values (in descending order)
7. mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution
8. t = t+1; // Increment iteration counter
9. return mu // Return mean of final sampling distribution as solution

Related methods

Simulated annealing
Simulated annealing
Simulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...
Genetic algorithms
Harmony search
Harmony search
In computer science and operations research, harmony search is a phenomenon-mimicking algorithm inspired by the improvisation process of musicians...
Estimation of distribution algorithm
Tabu search
Tabu search
Tabu search is a mathematical optimization method, belonging to the class of trajectory based techniques. Tabu search enhances the performance of a local search method by using memory structures that describe the visited solutions: once a potential solution has been determined, it is marked as...

Estimation via importance sampling

Generic CE algorithm

Continuous optimization—example

Pseudo-code

Related methods

See also