Element uniqueness problem
Encyclopedia
In computational complexity theory
, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct.
It is a well studied problem in many different models of computation. The problem may be solved by sorting
the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a randomized algorithm
that inserts each item into a hash table
and compares only those elements that are placed in the same hash table cell.
It is known that the problem's time complexity
is Θ
(n log n), i.e., both the upper and lower bounds on its time complexity are of order of the linearithmic function in the algebraic decision tree model of computation
, a model of computation in which the elements may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. In other words, an asymptotically optimal
algorithm of linearithmic time complexity is known for this model. The algebraic computation tree model basically means that the allowable algorithms are only the ones that can perform polynomial operations of bounded degree on the input data and comparisons of the results of these computations.
The same lower bound was later proved for the randomized algebraic decision tree model.
It is also known that quantum algorithm
s can solve this problem faster in queries. The optimal algorithm is by Andris Ambainis
, and the lower bound is due to Scott Aaronson
and Yaoyun Shi.
Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.
s are inapplicable for determining lower bounds for algorithmic problems for objects that have some a priori properties which can be exploited in construction of algorithms. For example, if it is known that the n objects are integer numbers from the range [1..n], then the element uniqueness problem may be solved in O
(n) time by a modification of bucket sort
.
of computation.
The algorithm is a generalization of the one for a special case of k=2, which had a rather convoluted history of publication.
The above algorithms rely only on the test of identity of the elements. If sorting is allowed, previously known order statistics finding algorithms may be exploited. For example, for k=2, a median
may be found first in linear time, and then it may be easily tested whether there are more than n/2 median elements. However the above algorithms require fewer comparisons than the order statistics algorithms.
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct.
It is a well studied problem in many different models of computation. The problem may be solved by sorting
Sorting
Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two common, yet distinct meanings:# ordering: arranging items of the same kind, class, nature, etc...
the list and then checking if there are any consecutive equal elements; it may also be solved in linear expected time by a randomized algorithm
Randomized algorithm
A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits...
that inserts each item into a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
and compares only those elements that are placed in the same hash table cell.
It is known that the problem's time complexity
Time complexity
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the size of the input to the problem. The time complexity of an algorithm is commonly expressed using big O notation, which suppresses multiplicative constants and...
is Θ
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
(n log n), i.e., both the upper and lower bounds on its time complexity are of order of the linearithmic function in the algebraic decision tree model of computation
Model of computation
In computability theory and computational complexity theory, a model of computation is the definition of the set of allowable operations used in computation and their respective costs...
, a model of computation in which the elements may not be used to index the computer's memory (as in the hash table solution) but may only be accessed by computing and comparing simple algebraic functions of their values. In other words, an asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...
algorithm of linearithmic time complexity is known for this model. The algebraic computation tree model basically means that the allowable algorithms are only the ones that can perform polynomial operations of bounded degree on the input data and comparisons of the results of these computations.
The same lower bound was later proved for the randomized algebraic decision tree model.
It is also known that quantum algorithm
Quantum algorithm
In quantum computing, a quantum algorithm is an algorithm which runs on a realistic model of quantum computation, the most commonly used model being the quantum circuit model of computation. A classical algorithm is a finite sequence of instructions, or a step-by-step procedure for solving a...
s can solve this problem faster in queries. The optimal algorithm is by Andris Ambainis
Andris Ambainis
Andris Ambainis is a Latvian computer scientist active in the fields of quantum information theory and quantum computing. He has held past positions at the Institute for Advanced Study at Princeton, New Jersey and the Institute for Quantum Computing at the University of Waterloo. He is currently a...
, and the lower bound is due to Scott Aaronson
Scott Aaronson
Scott Joel Aaronson is a theoretical computer scientist and faculty member in the Electrical Engineering and Computer Science department at the Massachusetts Institute of Technology.-Education:...
and Yaoyun Shi.
Several lower bounds in computational complexity are proved by reducing the element distinctness problem to the problem in question, i.e., by demonstrating that the solution of the element uniqueness problem may be quickly found after solving the problem in question.
Restrictions
Decision tree modelDecision tree model
In computational complexity and communication complexity theories the decision tree model is the model of computation or communication in which an algorithm or communication process is considered to be basically a decision tree, i.e., a sequence of branching operations based on comparisons of some...
s are inapplicable for determining lower bounds for algorithmic problems for objects that have some a priori properties which can be exploited in construction of algorithms. For example, if it is known that the n objects are integer numbers from the range [1..n], then the element uniqueness problem may be solved in O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
(n) time by a modification of bucket sort
Bucket sort
Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of...
.
Generalization: Finding repeated elements
Elements that occur more than n/k times in a multiset of size n may be found in time O(n log k). The element distinctness problem is a special case of k=n. This algorithm is optimal under the decision tree modelDecision tree model
In computational complexity and communication complexity theories the decision tree model is the model of computation or communication in which an algorithm or communication process is considered to be basically a decision tree, i.e., a sequence of branching operations based on comparisons of some...
of computation.
The algorithm is a generalization of the one for a special case of k=2, which had a rather convoluted history of publication.
The above algorithms rely only on the test of identity of the elements. If sorting is allowed, previously known order statistics finding algorithms may be exploited. For example, for k=2, a median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
may be found first in linear time, and then it may be easily tested whether there are more than n/2 median elements. However the above algorithms require fewer comparisons than the order statistics algorithms.