Counting sort
Encyclopedia
In computer science
, counting sort is an algorithm
for sorting
a collection of objects according to keys that are small integer
s; that is, it is an integer sorting
algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the maximum key value, so it is only suitable for use directly in situations where the keys are not significantly larger than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort
, that can handle larger keys more efficiently.
Because counting sort uses key values as indexes into an array, it is not a comparison sort
, and the Ω(n log n) lower bound for comparison sorting is inapplicable to it. Bucket sort
may be used for many of the same tasks as counting sort, with a similar time analysis; however, compared to counting sort, bucket sort requires linked list
s, dynamic array
s or a large amount of preallocated memory to hold the sets of items within each bucket, whereas counting sort instead stores a single number (the count of items) per bucket.
of items, each of which has a non-negative integer key whose maximum value is at most .
In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself, but this simplification does not accommodate many applications of counting sort. For instance, when used as a subroutine in radix sort
, the keys for each call to counting sort are individual digits of larger item keys; it would not suffice to return only a sorted list of the key digits, separated from the items.
In applications such as in radix sort, a bound on the maximum key value will be known in advance, and can be assumed to be part of the input to the algorithm. However, if the value of is not already known then it may be computed by an additional loop over the data to determine the maximum key value that actually occurs within the data.
The output is an array of the items, in order by their keys. Because of the application to radix sorting, it is important for counting sort to be a stable sort: if two items have the same key as each other, they should have the same relative position in the output as they did in the input.
of the number of times each key occurs within the input collection. It then performs a prefix sum
computation (a second loop, over the range of possible keys) to determine, for each key, the starting position in the output array of the items having that key. Finally, it loops over the items again, moving each item into its sorted position.
In pseudocode, this may be expressed as follows:
After the first for loop, Count[i] stores the number of items with key equal to i. After the second for loop, it instead stores the number of items with key less than i, which is the same as the first index at which an item with key i should be stored. Throughout the third loop, Count[i] always stores the next position into which an item with key i should be stored, so each item is moved into its correct position in the output array.
Because it uses arrays of length and , the total space usage of the algorithm is also . For problem instances in which the maximum key value is significantly smaller than the number of items, counting sort can be highly space-efficient, as the only storage it uses other than its input and output arrays is the Count array which uses space .
For data in which the maximum key size is significantly smaller than the number of data items, counting sort may be parallelized
by splitting the input into subarrays of approximately equal size, processing each subarray in parallel to generate a separate count array for each subarray, and then merging the count arrays. When used as part of a parallel radix sort algorithm, the key size (base of the radix representation) should be chosen to match the size of the split subarrays. The simplicity of the counting sort algorithm and its use of the easily-parallelizable prefix sum primitive also make it usable in more fine-grained parallel algorithms.
As described, counting sort is not an in-place algorithm
; even disregarding the count array, it needs separate input and output arrays. It is possible to modify the algorithm so that it places the items into sorted order within the same array that was given to it as the input, using only the count array as auxiliary storage; however, the modified in-place version of counting sort is not stable.
counting sort, and its application to radix sorting, were both invented by Harold H. Seward
in 1954.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, counting sort is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
for sorting
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...
a collection of objects according to keys that are small integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
s; that is, it is an integer sorting
Integer sorting
In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by numeric keys, each of which is an integer. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers or text strings...
algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the maximum key value, so it is only suitable for use directly in situations where the keys are not significantly larger than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort
Radix sort
In computer science, radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value...
, that can handle larger keys more efficiently.
Because counting sort uses key values as indexes into an array, it is not a comparison sort
Comparison sort
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list...
, and the Ω(n log n) lower bound for comparison sorting is inapplicable to it. Bucket sort
Bucket sort
Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of...
may be used for many of the same tasks as counting sort, with a similar time analysis; however, compared to counting sort, bucket sort requires linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
s, dynamic array
Dynamic array
In computer science, a dynamic array, growable array, resizable array, dynamic table, or array list is a random access, variable-size list data structure that allows elements to be added or removed...
s or a large amount of preallocated memory to hold the sets of items within each bucket, whereas counting sort instead stores a single number (the count of items) per bucket.
Input and output assumptions
In the most general case, the input to counting sort consists of a collectionCollection (computing)
In computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting...
of items, each of which has a non-negative integer key whose maximum value is at most .
In some descriptions of counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself, but this simplification does not accommodate many applications of counting sort. For instance, when used as a subroutine in radix sort
Radix sort
In computer science, radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value...
, the keys for each call to counting sort are individual digits of larger item keys; it would not suffice to return only a sorted list of the key digits, separated from the items.
In applications such as in radix sort, a bound on the maximum key value will be known in advance, and can be assumed to be part of the input to the algorithm. However, if the value of is not already known then it may be computed by an additional loop over the data to determine the maximum key value that actually occurs within the data.
The output is an array of the items, in order by their keys. Because of the application to radix sorting, it is important for counting sort to be a stable sort: if two items have the same key as each other, they should have the same relative position in the output as they did in the input.
The algorithm
In summary, the algorithm loops over the items, computing a histogramHistogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
of the number of times each key occurs within the input collection. It then performs a prefix sum
Prefix sum
In computer science, the prefix sum, or scan, of a sequence of numbers is a second sequence of numbers , the sums of prefixes of the input sequence:-Parallel algorithm:A prefix sum can be calculated in parallel by the following steps....
computation (a second loop, over the range of possible keys) to determine, for each key, the starting position in the output array of the items having that key. Finally, it loops over the items again, moving each item into its sorted position.
In pseudocode, this may be expressed as follows:
After the first for loop, Count[i] stores the number of items with key equal to i. After the second for loop, it instead stores the number of items with key less than i, which is the same as the first index at which an item with key i should be stored. Throughout the third loop, Count[i] always stores the next position into which an item with key i should be stored, so each item is moved into its correct position in the output array.
Analysis
Because the algorithm uses only simple for loops, without recursion or subroutine calls, it is straightforward to analyze. The initialization of the Count array, and the second for loop which performs a prefix sum on the count array, each iterate at most times and therefore take time. The other two for loops, and the initialization of the output array, each take time. Therefore the time for the whole algorithm is the sum of the times for these steps, .Because it uses arrays of length and , the total space usage of the algorithm is also . For problem instances in which the maximum key value is significantly smaller than the number of items, counting sort can be highly space-efficient, as the only storage it uses other than its input and output arrays is the Count array which uses space .
Variant algorithms
If each item to be sorted is itself an integer, and the keys are the same as the items, then the second and third loops of counting sort can be combined; in the second loop, instead of computing the position where items with key i should be placed in the output, simply append Count[i] copies of the number i to the output. This algorithm may also be used to eliminate duplicate keys, by replacing the Count array with a bit vector that stores a one for a key that is present in the input and a zero for a key that is not present.For data in which the maximum key size is significantly smaller than the number of data items, counting sort may be parallelized
Parallel algorithm
In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.Some algorithms...
by splitting the input into subarrays of approximately equal size, processing each subarray in parallel to generate a separate count array for each subarray, and then merging the count arrays. When used as part of a parallel radix sort algorithm, the key size (base of the radix representation) should be chosen to match the size of the split subarrays. The simplicity of the counting sort algorithm and its use of the easily-parallelizable prefix sum primitive also make it usable in more fine-grained parallel algorithms.
As described, counting sort is not an in-place algorithm
In-place algorithm
In computer science, an in-place algorithm is an algorithm which transforms input using a data structure with a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes...
; even disregarding the count array, it needs separate input and output arrays. It is possible to modify the algorithm so that it places the items into sorted order within the same array that was given to it as the input, using only the count array as auxiliary storage; however, the modified in-place version of counting sort is not stable.
History
Although radix sorting itself dates back far longer,counting sort, and its application to radix sorting, were both invented by Harold H. Seward
Harold H. Seward
Harold H. Seward is a computer scientist and the developer of the radix sort algorithm in 1954 at MIT. He also developed the counting sort.-References:...
in 1954.