
Primary clustering
    
    Encyclopedia
    
        Primary clustering is the tendency for certain open-addressing hash table
s collision resolution schemes to create long sequences of filled slots. It is most commonly referred to in the context of problems with linear probing
.
These long chains degrade the hash-table performance closer to O(n) performance instead of closer to O(1).
Consider the linear probing hash function
h(k, i) : (h`(k) + i)mod N. With k being the key, i the probing-iteration, N being the number of slots in the hash-table and h`(k) being the secondary-hash function.
Thus the probing sequence for key k is: {h`(k), h`(k) + 1, h`(k) + 2, ..., h`(k) + n}. It is easy to see that the probing sequences for two different keys may overlap and create clustering.
Also since the probability of each slot being already full is equal to the load-factor
of the hash-table, as the factor approaches 1.0 so does the chance that the next slot in the probing sequence will be full, this degrades performance of the hash-table to linear time.
Note: clusters of filled slots can grow exponentially in some cases. For example, imagine two large clusters of filled slots separated by one empty slot. Placing an entry into this slot will double the cluster size.
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
s collision resolution schemes to create long sequences of filled slots. It is most commonly referred to in the context of problems with linear probing
Linear probing
Linear probing  is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location.   This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...
.
These long chains degrade the hash-table performance closer to O(n) performance instead of closer to O(1).
Consider the linear probing hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
h(k, i) : (h`(k) + i)mod N. With k being the key, i the probing-iteration, N being the number of slots in the hash-table and h`(k) being the secondary-hash function.
Thus the probing sequence for key k is: {h`(k), h`(k) + 1, h`(k) + 2, ..., h`(k) + n}. It is easy to see that the probing sequences for two different keys may overlap and create clustering.
Also since the probability of each slot being already full is equal to the load-factor
Load factor
Load factor may refer to:* Load factor , the ratio of the lift of an aircraft to its weight* Load factor , the ratio of the number of records to the number of addresses within a data structure...
of the hash-table, as the factor approaches 1.0 so does the chance that the next slot in the probing sequence will be full, this degrades performance of the hash-table to linear time.
Note: clusters of filled slots can grow exponentially in some cases. For example, imagine two large clusters of filled slots separated by one empty slot. Placing an entry into this slot will double the cluster size.


