Hopscotch hashing
Encyclopedia
Hopscotch hashing is a scheme in computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

 for resolving hash collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

s of values of hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

s in a table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

 using open addressing
Open addressing
Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array until either the target record is found, or an unused array slot is found, which indicates that...

. It is also well suited for implementing a concurrent hash table. Hopscotch hashing was introduced by Maurice Herlihy
Maurice Herlihy
Maurice Herlihy is a computer scientist active in the field of multiprocessor synchronization. Herlihy has contributed to the design of concurrent algorithms, and in particular to the exposition and quantification of the properties and uses of hardware synchronization operations...

, Nir Shavit
Nir Shavit
Nir Shavit is a Professor in the Computer Science Department at Tel Aviv University.Nir Shavit received B.Sc. and M.Sc. degrees in Computer Science from the Technion - Israel Institute of Technology in 1984 and 1986, and a Ph.D. in Computer Science from the Hebrew University of Jerusalem in 1990...

 and Moran Tzafrir in 2008. The name is derived from the sequence of hops that characterize the table's insertion algorithm.
The algorithm uses a single array of n buckets. For each bucket, its neighborhood is a small collection of nearby consecutive buckets (i.e. one with close indexes to the original hashed bucket). The desired property of the neighborhood is that the cost of finding an item in the buckets of the neighborhood is close to the cost of finding it in the bucket itself (for example, by having buckets in the neighborhood fall within the same cache line). The size of the neighborhood must be sufficient to accommodate a logarithmic number of items in the worst case (i.e. it must accommodate log(n) items), but only a constant number on average. If some bucket's neighborhood is filled, the table is resized.

In hopscotch hashing, as in cuckoo hashing
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...

, and unlike in linear probing
Linear probing
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...

, a given item will always be inserted-into and found-in the neighborhood of its hashed bucket. In other words, it will always be found either in its original hashed array entry, or in one of the next H-1 neighboring entries. H could, for example, be 32, the standard machine word size. The neighborhood is thus a "virtual" bucket that has fixed size and overlaps with the next H-1 buckets. To speed the search, each bucket (array entry) includes a "hop-information" word, an H-bit bitmap that indicates which of the next H-1 entries contain items that hashed to the current entry's virtual bucket. In this way, an item can be found quickly by looking at the word to see which entries belong to the bucket, and then scanning through the constant number of entries (most modern processors support special bit manipulation operations that make the lookup in the "hop-information" bitmap very fast).

Here is how to add item x which was hashed to bucket i:
  1. If the entry i is empty, add x to i and return.
  2. Starting at entry i, use a linear probe to find an empty entry at index j.
  3. If the empty entry's index j is within H-1 of entry i, place x there and return. Otherwise, entry j is too far from i. To create an empty entry closer to i, find an item y whose hash value lies between i and j, but within H-1 of j. Displacing y to j creates a new empty slot closer to i. Repeat until the empty entry is within H-1 of entry i, place x there and return. If no such item y exists, or if the bucket i already contains H items, resize and rehash the table.


The idea is that hopscotch hashing "moves the empty slot towards the desired bucket". This distinguishes it from linear probing
Linear probing
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...

 which leaves the empty slot where it was found, possibly far away from the original bucket, or from cuckoo hashing
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...

 that, in order to create a free bucket, moves an item out of one of the desired buckets in the target arrays, and only then tries to find the displaced item a new place.

To remove an item from the table, one simply removes it from the table entry. If the neighborhood buckets are cache aligned, then one could apply a reorganization operation in which items are moved into the now vacant location in order to improve alignment.

One advantage of hopscotch hashing is that it provides good performance at very high table load factors, even ones exceeding 0.9. Part of this efficiency is due to using a linear probe only to find an empty slot during insertion, not for every lookup as in the original linear probing
Linear probing
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...

 hash table algorithm. Another advantage is that one can use any hash function, in particular simple ones that are close-to-universal.

See also

  • Perfect hashing
  • Linear probing
    Linear probing
    Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular...

  • Hash collision
    Hash collision
    Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

  • Hash function
    Hash function
    A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

  • Quadratic probing
    Quadratic probing
    Quadratic probing is a scheme in computer programming for resolving collisions in hash tables.It is an open addressing method to handle overflows after a collision takes place in some bucket of a hash table....

  • Cuckoo hashing
    Cuckoo hashing
    Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK