Subset sum problem
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, the subset sum problem is an important problem in complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

 and cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

. The problem is this: given a set of integers, is there a non-empty subset whose sum is zero? For example, given the set { −7, −3, −2, 5, 8}, the answer is yes because the subset { −3, −2, 5} sums to zero. The problem is NP-complete
NP-complete
In computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...

.

An equivalent problem is this: given a set of integers and an integer s, does any non-empty subset sum to s? Subset sum can also be thought of as a special case of the knapsack problem
Knapsack problem
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as...

. One interesting special case of subset sum is the partition problem
Partition problem
In computer science, the partition problem is an NP-complete problem. The problem is to decide whether a given multiset of integers can be partitioned into two "halves" that have the same sum...

, in which s is half of the sum of all elements in the set.

Complexity

The complexity (difficulty of solution) of subset sum can be viewed as depending on two parameters, N, the number of decision variables, and P, the precision of the problem (stated as the number of binary place values that it takes to state the problem). (Note: here the letters N and P mean something different than what they mean in the NP
NP (complexity)
In computational complexity theory, NP is one of the most fundamental complexity classes.The abbreviation NP refers to "nondeterministic polynomial time."...

class of problems.)

The complexity of the best known algorithms is exponential in the smaller of the two parameters N and P. Thus, the problem is most difficult if N and P are of the same order. It only becomes easy if either N or P becomes very small.

If N (the number of variables) is small, then an exhaustive search for the solution is practical. If P (the number of place values) is a small fixed number, then there are dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

 algorithms that can solve it exactly.

What is happening is that the problem becomes seemingly non-exponential when it is practical to count the entire solution space. There are two ways to count the solution space in the subset sum problem. One is to count the number of ways the variables can be combined. There are 2N possible ways to combine the variables. However, with N = 10, there are only 1024 possible combinations to check. These can be counted easily with a branching search. The other way is to count all possible numerical values that the combinations can take. There are 2P possible numerical sums. However, with P = 5 there are only 32 possible numerical values that the combinations can take. These can be counted easily with a dynamic programming algorithm. When N = P and both are large, then there is no aspect of the solution space that can be counted easily.

Efficient algorithms for both small N and small P cases are given below.

Exponential time algorithm

There are several ways to solve subset sum in time exponential in N. The most naïve algorithm would be to cycle through all subsets of N numbers and, for every one of them, check if the subset sums to the right number. The running time is of order O(2NN), since there are 2N subsets and, to check each subset, we need to sum at most N elements.

A better exponential time algorithm is known which runs in time O(2N/2). The algorithm splits arbitrarily the N elements into two sets of N/2 each. For each of these two sets, it stores a list of the sums of all 2N/2 possible subsets of its elements. Each of these two lists is then sorted. Using a standard comparison sorting algorithm for this step would take time O(2N/2N). However, given a sorted list of sums for k elements, the list can be expanded to two sorted lists with the introduction of a (k + 1)st element, and these two sorted lists can be merged in time O(2k). Thus, each list can be generated in sorted form in time O(2N/2). Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to s in time O(2N/2). To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than s, the algorithm moves to the next element in the first array. If it is less than s, the algorithm moves to the next element in the second array. If two elements with sum s are found, it stops. Horowitz and Sahni
Sartaj Sahni
Prof. Sartaj Kumar Sahni is an Indian computer scientist, now based in the USA, and is one of the pioneers in the field of data structures. He is a distinguished professor in the Department of Computer and Information Science and Engineering at the University of Florida.- Biography :Sahni received...

 first published this algorithm in a technical report in 1972.

Pseudo-polynomial time dynamic programming solution

The problem can be solved as follows using dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

. Suppose the sequence is
x1, ..., xn


and we wish to determine if there is a nonempty subset which sums to s. Let N be the sum of the negative values and P the sum of the positive values. Define the boolean-valued function Q(i,s) to be the value (true or false) of
"there is a nonempty subset of x1, ..., xi which sums to s".


Thus, the solution to the problem is the value of Q(n,0).

Clearly, if or so these values do not need to be stored or computed. Create an array to hold the values Q(i,s) for and

The array can now be filled in using a simple recursion. Initially, for set
Q(1,s) := (x1

s).



Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi s) or Q(i − 1,sxi)   for NsP.


For each assignment, the values of Q on the right side are already known, either because they were stored in the table for the previous value of i or because if or Therefore, the total number of arithmetic operations is For example, if all the values are O(nk) for some k, then the time required is O(nk+2).

This algorithm is easily modified to return the subset with sum 0 if there is one.

This solution does not count as polynomial time in complexity theory because is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of N and P, which are exponential in their numbers of bits.

For the case that each xi is positive and bounded by the same constant, Pisinger found a linear time algorithm.

Polynomial time approximate algorithm

An approximate
Approximation algorithm
In computer science and operations research, approximation algorithms are algorithms used to find approximate solutions to optimization problems. Approximation algorithms are often associated with NP-hard problems; since it is unlikely that there can ever be efficient polynomial time exact...

 version of the subset sum would be: given a set of N numbers x1, x2, ..., xN and a number s, output
  • yes, if there is a subset that sums up to s;
  • no, if there is no subset summing up to a number between and s for some small
  • any answer, if there is a subset summing up to a number between and s but no subset summing up to s.

If all numbers are non-negative, the approximate subset sum is solvable in time polynomial in N and 1/c.

The solution for subset sum also provides the solution for the original subset sum problem in the case where the numbers are small (again, for nonnegative numbers). If any sum of the numbers can be specified with at most P bits, then solving the problem approximately with is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in N and 2P (i.e., exponential in P).

The algorithm for the approximate subset sum problem is as follows:
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi + y, for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in increasing order do
//trim the list by eliminating numbers close to one another
//and throw out elements greater than s
if y + cs/N < zs, set y = z and add z to S
if S contains a number between (1 − c)s and s, output yes, otherwise no
The algorithm is polynomial time because the lists S, T and U always remain of size polynomial in N and 1/c and, as long as they are of polynomial size, all operations on them can be done in polynomial time. The size of lists is kept polynomial by the trimming step, in which we only include a number z into S if it is greater than the previous one by cs/N and not greater than s.

This step ensures that each element in S is smaller than the next one by at least cs/N and do not contain elements greater than s. Any list with that property consists of no more than N/c elements.

The algorithm is correct because each step introduces an additive error of at most cs/N and N steps together introduce the error of at most cs.

See also

  • 3SUM
    3SUM
    In computational complexity theory, 3SUM is the following computational problem conjectured to require roughly quadratic time:There is a simple algorithm to solve 3SUM in O time. This is the fastest algorithm known in models that do not decompose the integers into bits, but matching lower bounds...

  • Merkle–Hellman knapsack cryptosystem
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK