SUHA
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, SUHA (Simple Uniform Hashing Assumption) or the uniform hashing assumption is a basic assumption that facilitates the mathematical analysis of hash tables
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

. The assumption states that a hypothetical hashing function will evenly distribute items into the slots of a hash table. Moreover, each item to be hashed has an equal probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 of being placed into a slot, regardless of the other elements already placed. This assumption generalizes the details of the hash function and allows for certain assumptions about the stochastic system.

Applications

SUHA is most commonly used as a foundation for mathematical proofs describing the properties and behavior of hash tables in theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

. Minimizing hashing collisions
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

 can be achieved with a uniform hashing function. These functions often rely on the specific input data set and can be quite difficult to implement. Assuming uniform hashing allows hash table analysis to be made without exact knowledge of the input or the hash function used.

Mathematical implications

Certain properties of hash tables can be derived once uniform hashing is assumed.

Uniform distribution

Under the assumption of uniform hashing, given a hash function h, and a hash table of size m. The probability that two non-equal elements will hash to the same slot is

Collision chain length

Under the assumption of uniform hashing, the load factor
Load factor
Load factor may refer to:* Load factor , the ratio of the lift of an aircraft to its weight* Load factor , the ratio of the number of records to the number of addresses within a data structure...

  and the average chain length of a hash table of size m with n elements will be

Successful lookup

Under the assumption of uniform hashing, the average time (in big-O notation
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

) to successfully find an element in a hash table using chaining is

Unsuccessful lookup

Under the assumption of uniform hashing, the average time (in big-O notation) to unsuccessfully find an element in a hash table using chaining is

Example

A simple example of using SUHA can be seen while observing an arbitrary hash table of size 10 and a data set of 30 unique elements. If chaining is used to deal with collisions, the average chain length of this hash table may be a desirable value. Without any assumptions and with no more additional information about the data or hash function, the chain length cannot be estimated. With SUHA however, we can state that because of an assumed uniform hashing, each element has an equal probability of mapping to a slot. Since no particular slot should be favored over another, the 30 elements should hash into the 10 slots uniformly. This will produce a hash table with, on average, 10 chains each of length 3

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK