Approximation algorithm
Encyclopedia
In computer science
and operations research
, approximation algorithms are algorithm
s used to find approximate solutions to optimization problem
s. Approximation algorithms are often associated with NP-hard
problems; since it is unlikely that there can ever be efficient polynomial time exact algorithms solving NP-hard problems, one settles for polynomial time sub-optimal solutions. Unlike heuristics, which usually only find reasonably good solutions reasonably fast, one wants provable solution quality and provable run time bounds. Ideally, the approximation is optimal up to a small constant factor (for instance within 5% of the optimal solution). Approximation algorithms are increasingly being used for problems where exact polynomial-time algorithms are known but are too expensive due to the input size.
A typical example for an approximation algorithm is the one for vertex cover in graph
s: find an uncovered edge and add both endpoints to the vertex cover, until none remain. It is clear that the resulting cover is at most twice as large as the optimal one. This is a constant factor approximation algorithm with a factor of 2.
NP-hard problems vary greatly in their approximability; some, such as the bin packing problem
, can be approximated within any factor greater than 1 (such a family of approximation algorithms is often called a polynomial time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial factor unless P = NP, such as the maximum clique problem.
NP-hard problems can often be expressed as integer programs (IP) and solved exactly in exponential time. Many approximation algorithms emerge from the linear programming relaxation of the integer program.
Not all approximation algorithms are suitable for all practical applications. They often use IP/LP/Semidefinite
solvers, complex data structures or sophisticated algorithmic techniques which lead to difficult implementation problems. Also, some approximation algorithms have impractical running times even though they are polynomial time, for example O(n2000). Yet the study of even very expensive algorithms is not a completely theoretical pursuit as they can yield valuable insights. A classic example is the initial PTAS for Euclidean TSP due to Sanjeev Arora
which had prohibitive running time, yet within a year, Arora refined the ideas into a linear time algorithm. Such algorithms are also worthwhile in some applications where the running times and cost can be justified e.g. computational biology, financial engineering, transportation planning, and inventory management. In such scenarios, they must compete with the corresponding direct IP formulations.
Another limitation of the approach is that it applies only to optimization problems and not to "pure" decision problem
s like satisfiability
, although it is often possible to conceive optimization versions of such problems, such as the maximum satisfiability problem
(Max SAT).
Inapproximability has been a fruitful area of research in computational complexity theory since the 1990 result of Feige, Goldwasser, Lovasz, Safra and Szegedy on the inapproximability of Independent Set. After Arora et al. proved the PCP theorem
a year later, it has now been shown that Johnson's 1974 approximation algorithms for Max SAT, Set Cover, Independent Set and Coloring all achieve the optimal approximation ratio, assuming P != NP.
The factor ρ is called the relative performance guarantee. An approximation algorithm has an absolute performance guarantee or bounded error c, if it has been proven for every instance x that
Similarly, the performance guarantee, R(x,y), of a solution y to an instance x is defined as
where f(y) is the value/cost of the solution y for the instance x. Clearly, the performance guarantee is greater than or equal to 1 and equal to 1 if and only if y is an optimal solution. If an algorithm A guarantees to return solutions with a performance guarantee of at most r(n), then A is said to be an r(n)-approximation algorithm and has an approximation ratio of r(n). Likewise, a problem with an r(n)-approximation algorithm is said to be r(n)-approximable or have an approximation ratio of r(n).
One may note that for minimization problems, the two different guarantees provide the same result and that for maximization problems, a relative performance guarantee of ρ is equivalent to a performance guarantee of . In the literature, both definitions are common but it is clear which definition is used since, for maximization problems, as ρ ≤ 1 while r ≥ 1.
The absolute performance guarantee of some approximation algorithm A, where x refers to an instance of a problem, and where is the performance guarantee of A on x (i.e. ρ for problem instance x) is:
That is to say that is the largest bound on the approximation ratio, r, that one sees over all possible instances of the problem. Likewise, the asymptotic performance ratio is:
That is to say that it is the same as the absolute performance ratio, with a lower bound n on the size of problem instances. These two types of ratios are used because there exist algorithms where the difference between these two is significant.
instances due to Johan Håstad
. As mentioned previously, when c = 1, the problem is said to have a polynomial-time approximation scheme
.
An ϵ-term may appear when an approximation algorithm introduces a multiplicative error and a constant error while the minimum optimum of instances of size n goes to infinity as n does. In this case, the approximation ratio is c ∓ k / OPT = c ∓ o(1) for some constants c and k. Given arbitrary ϵ > 0, one can choose a large enough N such that the term k / OPT < ϵ for every n ≥ N. For every fixed ϵ, instances of size n < N can be solved by brute force , thereby showing an approximation ratio — existence of approximation algorithms with a guarantee — of c ∓ ϵ for every ϵ > 0.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and operations research
Operations research
Operations research is an interdisciplinary mathematical science that focuses on the effective use of technology by organizations...
, approximation algorithms are algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s used to find approximate solutions to optimization problem
Optimization problem
In mathematics and computer science, an optimization problem is the problem of finding the best solution from all feasible solutions. Optimization problems can be divided into two categories depending on whether the variables are continuous or discrete. An optimization problem with discrete...
s. Approximation algorithms are often associated with NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...
problems; since it is unlikely that there can ever be efficient polynomial time exact algorithms solving NP-hard problems, one settles for polynomial time sub-optimal solutions. Unlike heuristics, which usually only find reasonably good solutions reasonably fast, one wants provable solution quality and provable run time bounds. Ideally, the approximation is optimal up to a small constant factor (for instance within 5% of the optimal solution). Approximation algorithms are increasingly being used for problems where exact polynomial-time algorithms are known but are too expensive due to the input size.
A typical example for an approximation algorithm is the one for vertex cover in graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...
s: find an uncovered edge and add both endpoints to the vertex cover, until none remain. It is clear that the resulting cover is at most twice as large as the optimal one. This is a constant factor approximation algorithm with a factor of 2.
NP-hard problems vary greatly in their approximability; some, such as the bin packing problem
Bin packing problem
In computational complexity theory, the bin packing problem is a combinatorial NP-hard problem. In it, objects of different volumes must be packed into a finite number of bins of capacity V in a way that minimizes the number of bins used....
, can be approximated within any factor greater than 1 (such a family of approximation algorithms is often called a polynomial time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial factor unless P = NP, such as the maximum clique problem.
NP-hard problems can often be expressed as integer programs (IP) and solved exactly in exponential time. Many approximation algorithms emerge from the linear programming relaxation of the integer program.
Not all approximation algorithms are suitable for all practical applications. They often use IP/LP/Semidefinite
Semidefinite programming
Semidefinite programming is a subfield of convex optimization concerned with the optimization of a linear objective functionover the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron....
solvers, complex data structures or sophisticated algorithmic techniques which lead to difficult implementation problems. Also, some approximation algorithms have impractical running times even though they are polynomial time, for example O(n2000). Yet the study of even very expensive algorithms is not a completely theoretical pursuit as they can yield valuable insights. A classic example is the initial PTAS for Euclidean TSP due to Sanjeev Arora
Sanjeev Arora
Sanjeev Arora is a theoretical computer scientist who is best known for his work on probabilistically checkable proofs and, in particular, the PCP theorem. He is currently the Charles C...
which had prohibitive running time, yet within a year, Arora refined the ideas into a linear time algorithm. Such algorithms are also worthwhile in some applications where the running times and cost can be justified e.g. computational biology, financial engineering, transportation planning, and inventory management. In such scenarios, they must compete with the corresponding direct IP formulations.
Another limitation of the approach is that it applies only to optimization problems and not to "pure" decision problem
Decision problem
In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
s like satisfiability
Boolean satisfiability problem
In computer science, satisfiability is the problem of determining if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE...
, although it is often possible to conceive optimization versions of such problems, such as the maximum satisfiability problem
Maximum satisfiability problem
In computational complexity theory, the Maximum Satisfiability problem is the problem of determining the maximum number of clauses, of a given Boolean formula, that can be satisfied by some assignment...
(Max SAT).
Inapproximability has been a fruitful area of research in computational complexity theory since the 1990 result of Feige, Goldwasser, Lovasz, Safra and Szegedy on the inapproximability of Independent Set. After Arora et al. proved the PCP theorem
PCP theorem
In computational complexity theory, the PCP theorem states that every decision problem in the NP complexity class has probabilistically checkable proofs of constant query complexity and logarithmic randomness complexity .The PCP theorem says that for some universal constant K, for every...
a year later, it has now been shown that Johnson's 1974 approximation algorithms for Max SAT, Set Cover, Independent Set and Coloring all achieve the optimal approximation ratio, assuming P != NP.
Performance guarantees
For some approximation algorithms it is possible to prove certain properties about the approximation of the optimum result. For example, in the case of a ρ-approximation algorithm A it has been proven that the value/cost, f(x), of the approximate solution A(x) to an instance x will not be more (or less, depending on the situation) than a factor ρ times the value, OPT, of an optimum solution.The factor ρ is called the relative performance guarantee. An approximation algorithm has an absolute performance guarantee or bounded error c, if it has been proven for every instance x that
Similarly, the performance guarantee, R(x,y), of a solution y to an instance x is defined as
- R(x,y) =
where f(y) is the value/cost of the solution y for the instance x. Clearly, the performance guarantee is greater than or equal to 1 and equal to 1 if and only if y is an optimal solution. If an algorithm A guarantees to return solutions with a performance guarantee of at most r(n), then A is said to be an r(n)-approximation algorithm and has an approximation ratio of r(n). Likewise, a problem with an r(n)-approximation algorithm is said to be r(n)-approximable or have an approximation ratio of r(n).
One may note that for minimization problems, the two different guarantees provide the same result and that for maximization problems, a relative performance guarantee of ρ is equivalent to a performance guarantee of . In the literature, both definitions are common but it is clear which definition is used since, for maximization problems, as ρ ≤ 1 while r ≥ 1.
The absolute performance guarantee of some approximation algorithm A, where x refers to an instance of a problem, and where is the performance guarantee of A on x (i.e. ρ for problem instance x) is:
That is to say that is the largest bound on the approximation ratio, r, that one sees over all possible instances of the problem. Likewise, the asymptotic performance ratio is:
That is to say that it is the same as the absolute performance ratio, with a lower bound n on the size of problem instances. These two types of ratios are used because there exist algorithms where the difference between these two is significant.
r-approx | ρ-approx | rel. error | rel. error | norm. rel. error | abs. error | |
---|---|---|---|---|---|---|
max: | ||||||
min: | ||||||
Epsilon terms
In the literature, an approximation ratio for a maximization (minimization) problem of c - ϵ (min: c + ϵ) means that the algorithm has an approximation ratio of c ∓ ϵ for arbitrary ϵ > 0 but that the ratio has not (or cannot) be shown for ϵ = 0. An example of this is the optimal inapproximability — inexistence of approximation — ratio of 7 / 8 + ϵ for satisfiable MAX-3SATMAX-3SAT
MAX-3SAT is a problem in the computational complexity subfield of computer science. It generalises the Boolean satisfiability problem which is a decision problem considered in complexity theory. It is defined as:Given a 3-CNF formula Φ MAX-3SAT is a problem in the computational complexity...
instances due to Johan Håstad
Johan Håstad
Johan Torkel Håstad is a Swedish theoretical computer scientist most known for his work on computational complexity theory. He was the recipient of the Gödel Prize in 1994 and 2011 and the ACM Doctoral Dissertation Award in 1986, among other prizes...
. As mentioned previously, when c = 1, the problem is said to have a polynomial-time approximation scheme
Polynomial-time approximation scheme
In computer science, a polynomial-time approximation scheme is a type of approximation algorithm for optimization problems ....
.
An ϵ-term may appear when an approximation algorithm introduces a multiplicative error and a constant error while the minimum optimum of instances of size n goes to infinity as n does. In this case, the approximation ratio is c ∓ k / OPT = c ∓ o(1) for some constants c and k. Given arbitrary ϵ > 0, one can choose a large enough N such that the term k / OPT < ϵ for every n ≥ N. For every fixed ϵ, instances of size n < N can be solved by brute force , thereby showing an approximation ratio — existence of approximation algorithms with a guarantee — of c ∓ ϵ for every ϵ > 0.
See also
- Domination analysisDomination analysisDomination analysis of an approximation algorithm is a way to estimate its performance, introduced by Glover and Punnen in 1997. Unlike the classical approximation ratio analysis, which compares the numerical quality of a calculated solution with that of an optimal solution, domination analysis...
considers guarantees in terms of the rank of the computed solution.
External links
- Pierluigi Crescenzi, Viggo Kann, Magnús Halldórsson, Marek KarpinskiMarek KarpinskiMarek Karpinski is a computer scientist and mathematician known for his research in the theory of algorithms and their applications, combinatorial optimization, computational complexity, and mathematical foundations...
and Gerhard Woeginger, A compendium of NP optimization problems.