Open addressing
Encyclopedia
Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. Well known probe sequences include:
Linear probing
: in which the interval between probes is fixed — often at 1.
Quadratic probing
: in which the interval between probes increases linearly (hence, the indices are described by a quadratic function).
Double hashing
: in which the interval between probes is fixed for each record but is computed by another hash function.
The main tradeoffs between these methods are that linear probing has the best cache performance
but is most sensitive to clustering, while double hashing has poor cache performance but exhibits virtually no clustering; quadratic probing falls in-between in both areas. Double hashing can also require more computation than other forms of probing. Some open addressing methods, such as
last-come-first-served hashing and cuckoo hashing
move existing keys around in the array to make room for the new key. This gives better maximum search times than the methods based on probing.
A critical influence on performance of an open addressing hash table is the load factor; that is, the proportion of the slots in the array that are used. As the load factor increases towards 100%, the number of probes that may be required to find or insert a given key rises dramatically. Once the table becomes full, probing algorithms may even fail to terminate. Even with good hash functions, load factors are normally limited to 80%. A poor hash function can exhibit poor performance even at very low load factors by generating significant clustering. What causes hash functions to cluster is not well understood , and it is easy to unintentionally write a hash function which causes severe clustering.
is an implementation of an open addressing hash table with linear probing and single-slot stepping, a common approach that is effective if the hash function is good. Each of the lookup, set and remove functions use a common internal function find_slot to locate the array slot that either does or should contain a given key.
record pair { key, value }
var pair array slot[0..num_slots-1]
function find_slot(key)
i := hash(key) modulo num_slots
// search until we either find the key, or find an empty slot.
while ( (slot[i] is occupied) and ( slot[i].key ≠ key ) ) do
i := (i + 1) modulo num_slots
repeat
return i
function lookup(key)
i := find_slot(key)
if slot[i] is occupied // key is in table
return slot[i].value
else // key is not in table
return not found
function set(key, value)
i := find_slot(key)
if slot[i] is occupied
slot[i].value := value
else
if the table is almost full
rebuild the table larger (note 1)
i := find_slot(key)
slot[i].key := key
slot[i].value := value
Another example showing open addressing technique. Presented function is converting each part(4) of an internet protocol address, where NOT, XOR, OR and AND are bitwise operations and << and >> are left and right logical shifts:
// key_1,key_2,key_3,key_4 are following 3-digit numbers - parts of ip address xxx.xxx.xxx.xxx
function ip(key parts)
j := 1
do
key := (key_2 << 2)
key := (key + (key_3 << 7))
key := key + (j OR key_4 >> 2) * (key_4) * (j + key_1) XOR j
key := key AND _prime_ // _prime_ is a prime number
j := (j+1)
while collision
return key
note 1 : Rebuilding the table requires allocating a larger array and recursively using the set operation to insert all the elements of the old array into the new larger array. It is common to increase the array size exponentially
, for example by doubling the old array size.
function remove(key)
i := find_slot(key)
if slot[i] is unoccupied
return // key is not in the table
j := i
loop
mark slot[i] as unoccupied
r2:
j := (j+1) modulo num_slots
if slot[j] is unoccupied
exit loop
k := hash(slot[j].key) modulo num_slots
// k lies cyclically in ]i,j]
// | i.k.j |
// |....j i.k.| or |.k..j i...|
if ( (i<=j) ? ((i
goto r2;
slot[i] := slot[j]
i := j
note 2 : For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). At this point in the pseudocode, i is a vacant slot that might be invalidating this property for subsequent records in the cluster. j is such a subsequent record. k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant.
Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records. The methods above provide O(1) updating and removal of existing records, with occasional rebuilding if the high water mark of the table size grows.
The O(1) remove method above is only possible in linearly probed hash tables with single-slot stepping. In the case where many records are to be deleted in one operation, marking the slots for deletion and later rebuilding may be more efficient.
Linear probing
Linear probing
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...
: in which the interval between probes is fixed — often at 1.
Quadratic probing
Quadratic probing
Quadratic probing is a scheme in computer programming for resolving collisions in hash tables.It is an open addressing method to handle overflows after a collision takes place in some bucket of a hash table....
: in which the interval between probes increases linearly (hence, the indices are described by a quadratic function).
Double hashing
Double hashing
Double hashing is a computer programming technique used in hash tables to resolve hash collisions, cases when two different values to be searched for produce the same hash key...
: in which the interval between probes is fixed for each record but is computed by another hash function.
The main tradeoffs between these methods are that linear probing has the best cache performance
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...
but is most sensitive to clustering, while double hashing has poor cache performance but exhibits virtually no clustering; quadratic probing falls in-between in both areas. Double hashing can also require more computation than other forms of probing. Some open addressing methods, such as
last-come-first-served hashing and cuckoo hashing
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...
move existing keys around in the array to make room for the new key. This gives better maximum search times than the methods based on probing.
A critical influence on performance of an open addressing hash table is the load factor; that is, the proportion of the slots in the array that are used. As the load factor increases towards 100%, the number of probes that may be required to find or insert a given key rises dramatically. Once the table becomes full, probing algorithms may even fail to terminate. Even with good hash functions, load factors are normally limited to 80%. A poor hash function can exhibit poor performance even at very low load factors by generating significant clustering. What causes hash functions to cluster is not well understood , and it is easy to unintentionally write a hash function which causes severe clustering.
Example pseudo code
The following pseudocodePseudocode
In computer science and numerical computation, pseudocode is a compact and informal high-level description of the operating principle of a computer program or other algorithm. It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading...
is an implementation of an open addressing hash table with linear probing and single-slot stepping, a common approach that is effective if the hash function is good. Each of the lookup, set and remove functions use a common internal function find_slot to locate the array slot that either does or should contain a given key.
record pair { key, value }
var pair array slot[0..num_slots-1]
function find_slot(key)
i := hash(key) modulo num_slots
// search until we either find the key, or find an empty slot.
while ( (slot[i] is occupied) and ( slot[i].key ≠ key ) ) do
i := (i + 1) modulo num_slots
repeat
return i
function lookup(key)
i := find_slot(key)
if slot[i] is occupied // key is in table
return slot[i].value
else // key is not in table
return not found
function set(key, value)
i := find_slot(key)
if slot[i] is occupied
slot[i].value := value
else
if the table is almost full
rebuild the table larger (note 1)
i := find_slot(key)
slot[i].key := key
slot[i].value := value
Another example showing open addressing technique. Presented function is converting each part(4) of an internet protocol address, where NOT, XOR, OR and AND are bitwise operations and << and >> are left and right logical shifts:
// key_1,key_2,key_3,key_4 are following 3-digit numbers - parts of ip address xxx.xxx.xxx.xxx
function ip(key parts)
j := 1
do
key := (key_2 << 2)
key := (key + (key_3 << 7))
key := key + (j OR key_4 >> 2) * (key_4) * (j + key_1) XOR j
key := key AND _prime_ // _prime_ is a prime number
j := (j+1)
while collision
return key
note 1 : Rebuilding the table requires allocating a larger array and recursively using the set operation to insert all the elements of the old array into the new larger array. It is common to increase the array size exponentially
Exponential growth
Exponential growth occurs when the growth rate of a mathematical function is proportional to the function's current value...
, for example by doubling the old array size.
function remove(key)
i := find_slot(key)
if slot[i] is unoccupied
return // key is not in the table
j := i
loop
mark slot[i] as unoccupied
r2:
j := (j+1) modulo num_slots
if slot[j] is unoccupied
exit loop
k := hash(slot[j].key) modulo num_slots
// k lies cyclically in ]i,j]
// | i.k.j |
// |....j i.k.| or |.k..j i...|
if ( (i<=j) ? ((i
slot[i] := slot[j]
i := j
note 2 : For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). At this point in the pseudocode, i is a vacant slot that might be invalidating this property for subsequent records in the cluster. j is such a subsequent record. k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant.
Another technique for removal is simply to mark the slot as deleted. However this eventually requires rebuilding the table simply to remove deleted records. The methods above provide O(1) updating and removal of existing records, with occasional rebuilding if the high water mark of the table size grows.
The O(1) remove method above is only possible in linearly probed hash tables with single-slot stepping. In the case where many records are to be deleted in one operation, marking the slots for deletion and later rebuilding may be more efficient.