Cryptographic hash function
Encyclopedia
A cryptographic hash function is a deterministic procedure
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 that takes an arbitrary block of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 and returns a fixed-size bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

 string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the "message," and the hash value is sometimes called the message digest or simply digest.

The ideal cryptographic hash function has four main or significant properties:
  • it is easy (but not necessarily quick) to compute the hash value for any given message
  • it is infeasible to generate a message that has a given hash
  • it is infeasible to modify a message without changing the hash
  • it is infeasible to find two different messages with the same hash


Cryptographic hash functions have many information security
Information security
Information security means protecting information and information systems from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction....

 applications, notably in digital signature
Digital signature
A digital signature or digital signature scheme is a mathematical scheme for demonstrating the authenticity of a digital message or document. A valid digital signature gives a recipient reason to believe that the message was created by a known sender, and that it was not altered in transit...

s, message authentication codes (MACs), and other forms of authentication
Authentication
Authentication is the act of confirming the truth of an attribute of a datum or entity...

. They can also be used as ordinary hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

s, to index data in hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

s, for fingerprinting
Fingerprint (computing)
In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes...

, to detect duplicate data or uniquely identify files, and as checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

s to detect accidental data corruption. Indeed, in information security contexts, cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values, even though all these terms stand for functions with rather different properties and purposes.

Properties

Most cryptographic hash functions are designed to take a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 of any length as input and produce a fixed-length hash value.

A cryptographic hash function must be able to withstand all known types of cryptanalytic attack. As a minimum, it must have the following properties:
  • Preimage resistance
    Given a hash it should be difficult to find any message such that . This concept is related to that of one-way function
    One-way function
    In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here "easy" and "hard" are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems...

    . Functions that lack this property are vulnerable to preimage attack
    Preimage attack
    In cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...

    s.

  • Second-preimage resistance
    Given an input it should be difficult to find another input — where — such that . This property is sometimes referred to as weak collision resistance, and functions that lack this property are vulnerable to second-preimage attacks
    Preimage attack
    In cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...

    .

  • Collision resistance
    Collision resistance
    Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H = H, and a ≠ b.Every hash function with more inputs than outputs will necessarily have...

    It should be difficult to find two different messages and such that . Such a pair is called a cryptographic hash collision
    Hash collision
    Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

    . This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as that required for preimage-resistance, otherwise collisions may be found by a birthday attack
    Birthday attack
    A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties...

    .


These properties imply that a malicious adversary
Adversary (cryptography)
In cryptography, an adversary is a malicious entity whose aim is to prevent the users of the cryptosystem from achieving their goal...

 cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical.

A function meeting these criteria may still have undesirable properties. Currently popular cryptographic hash functions are vulnerable to length-extension attacks: given and but not , by choosing a suitable an attacker can calculate where denotes concatenation
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

. This property can be used to break naive authentication schemes based on hash functions. The HMAC
HMAC
In cryptography, HMAC is a specific construction for calculating a message authentication code involving a cryptographic hash function in combination with a secret key. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message...

 construction works around these problems.

Ideally, one may wish for even stronger conditions. It should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. Therefore, a cryptographic hash function should behave as much as possible like a random function
Random function
A random function is a function chosen at random from a finite family of functions. Typically, the family consists of the set of all maps from the domain to the codomain. Thus, a random function can be considered to map each input independently at random to any one of the possible outputs. Viewed...

 while still being deterministic and efficiently computable.

Checksum algorithms, such as CRC32 and other cyclic redundancy check
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...

s, are designed to meet much weaker requirements, and are generally unsuitable as cryptographic hash functions. For example, a CRC was used for message integrity in the WEP
Wired Equivalent Privacy
Wired Equivalent Privacy is a weak security algorithm for IEEE 802.11 wireless networks. Introduced as part of the original 802.11 standard ratified in September 1999, its intention was to provide data confidentiality comparable to that of a traditional wired network...

 encryption standard, but an attack was readily discovered which exploited the linearity of the checksum.

Degree of difficulty

In cryptographic practice, “difficult” generally means “almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important.” The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousand-fold advantage in processing power can be neutralized by adding a few dozen bits to the latter.

In some theoretical analyses
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

 “difficult” has a specific mathematical meaning, such as not solvable in asymptotic
Asymptotic computational complexity
In computational complexity theory, asymptotic computational complexity is the usage of the asymptotic analysis for the estimation of computational complexity of algorithms and computational problems, commonly associated with the usage of the big O notation....

 polynomial time.
Such interpretations of difficulty are important in the study of provably secure cryptographic hash function
Provably secure cryptographic hash function
In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory and formal reduction. These...

s but do not usually have a strong connection to practical security. For example, an exponential time algorithm can sometimes still be fast enough to make a feasible attack. Conversely, a polynomial time algorithm (e.g. one that requires n20 steps for n-digit keys) may be too slow for any practical use.

Illustration

An illustration of the potential use of a cryptographic hash is as follows: Alice
Alice and Bob
The names Alice and Bob are commonly used placeholder names for archetypal characters in fields such as cryptography and physics. The names are used for convenience; for example, "Alice sends a message to Bob encrypted with his public key" is easier to follow than "Party A sends a message to Party...

 poses a tough math problem to Bob
Alice and Bob
The names Alice and Bob are commonly used placeholder names for archetypal characters in fields such as cryptography and physics. The names are used for convenience; for example, "Alice sends a message to Bob encrypted with his public key" is easier to follow than "Party A sends a message to Party...

, and claims she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, appends a random nonce
Cryptographic nonce
In security engineering, nonce is an arbitrary number used only once to sign a cryptographic communication. It is similar in spirit to a nonce word, hence the name. It is often a random or pseudo-random number issued in an authentication protocol to ensure that old communications cannot be reused...

, computes its hash and tells Bob the hash value (whilst keeping the solution and nonce secret). This way, when Bob comes up with the solution himself a few days later, Alice can prove that she had the solution earlier by revealing the nonce to Bob. (This is an example of a simple commitment scheme
Commitment scheme
In cryptography, a commitment scheme allows one to commit to a value while keeping it hidden, with the ability to reveal the committed value later. Commitments are used to bind a party to a value so that they cannot adapt to other messages in order to gain some kind of inappropriate advantage...

; in actual practice, Alice and Bob will often be computer programs, and the secret would be something less easily spoofed than a claimed puzzle solution).

Verifying the integrity of files or messages

An important application of secure hashes is verification of message integrity. Determining whether any changes have been made to a message (or a file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

), for example, can be accomplished by comparing message digests calculated before, and after, transmission (or any other event).

For this reason, most digital signature
Digital signature
A digital signature or digital signature scheme is a mathematical scheme for demonstrating the authenticity of a digital message or document. A valid digital signature gives a recipient reason to believe that the message was created by a known sender, and that it was not altered in transit...

 algorithms only confirm the authenticity of a hashed digest of the message to be "signed." Verifying the authenticity of a hashed digest of the message is considered proof that the message itself is authentic.

A related application is password
Password
A password is a secret word or string of characters that is used for authentication, to prove identity or gain access to a resource . The password should be kept secret from those not allowed access....

 verification. Passwords are usually not stored in cleartext, for obvious reasons, but instead in digest form. To authenticate a user, the password presented by the user is hashed and compared with the stored hash.

File or data identifier

A message digest can also serve as a means of reliably identifying a file; several source code management systems, including Git
Git (software)
Git is a distributed revision control system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development. Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on...

, Mercurial
Mercurial (software)
Mercurial is a cross-platform, distributed revision control tool for software developers. It is mainly implemented using the Python programming language, but includes a binary diff implementation written in C. It is supported on Windows and Unix-like systems, such as FreeBSD, Mac OS X and Linux...

 and Monotone
Monotone (software)
Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames.The focus of the project is on integrity over performance...

, use the sha1sum
Sha1sum
sha1sum is a computer program that calculates and verifies SHA-1 hashes. It is commonly used to verify the integrity of files. It is installed by default in most Unix-like operating systems...

 of various types of content (file content, directory trees, ancestry information, etc.) to uniquely identify them. Hashes are used to identify files on peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 filesharing networks. For example, in an ed2k link, an MD4
MD4
The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms....

-variant hash is combined with the file size, providing sufficient information for locating file sources, downloading the file and verifying its contents. Magnet links are another example. Such file hashes are often the top hash of a hash list
Hash list
In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases...

 or a hash tree
Hash tree
In cryptography and computer science Hash trees or Merkle trees are a type of data structure which contains a tree of summary information about a larger piece of data – for instance a file – used to verify its contents. Hash trees are a combination of hash lists and hash chaining, which in turn are...

 which allows for additional benefits.

One of the main applications of a hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

 is to allow the fast look-up of a data in a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

. Being hash functions of a particular kind, cryptographic hash functions lend themselves well to this application too.

However, compared with standard hash functions, cryptographic hash functions tend to be much more expensive computationally. For this reason, they tend to be used in contexts where it is necessary for users to protect themselves against the possibility of forgery (the creation of data with the same digest as the expected data) by potentially malicious participants.

Pseudorandom generation and key derivation

Hash functions can also be used in the generation of pseudorandom bits, or to derive new keys or passwords
Key derivation function
In cryptography, a key derivation function derives one or more secret keys from a secret value such as a master key or other known information such as a password or passphrase using a pseudo-random function...

 from a single, secure key or password.

Hash functions based on block ciphers

There are several methods to use a block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

 to build a cryptographic hash function, specifically a one-way compression function.

The methods resemble the block cipher modes of operation
Block cipher modes of operation
In cryptography, modes of operation is the procedure of enabling the repeated and secure use of a block cipher under a single key.A block cipher by itself allows encryption only of a single data block of the cipher's block length. When targeting a variable-length message, the data must first be...

 usually used for encryption. All well-known hash functions, including MD4, MD5, SHA-1 and SHA-2 are built from block-cipher-like components designed for the purpose, with feedback to ensure that the resulting function is not bijective. SHA-3
NIST hash function competition
The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...

 finalists include functions with block-cipher-like components (e.g., Skein, BLAKE) and functions based on other designs (e.g., JH
JH (hash function)
JH is a cryptographic hash function submitted to the NIST hash function competition by Hongjun Wu. JH was chosen as one of the five finalists of the competition. JH has a 1024-bit state, and works on 512-bit input blocks...

, Keccak
Keccak
Keccak is a cryptographic hash function designed by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche. Keccak is one of five finalists in the NIST hash function competition to select a SHA-3 algorithm. The authors claim 12.5 cycles per byte on an Intel Core 2 CPU...

).

A standard block cipher such as AES
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

 can be used in place of these custom block ciphers; that might be useful when an embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

 needs to implement both encryption and hashing with minimal code size or hardware area. However, that approach can have costs in efficiency and security. The ciphers in hash functions are built for hashing: they use large keys and blocks, can efficiently change keys every block, and have been designed and vetted for resistance to related-key attack
Related-key attack
In cryptography, a related-key attack is any form of cryptanalysis where the attacker can observe the operation of a cipher under several different keys whose values are initially unknown, but where some mathematical relationship connecting the keys is known to the attacker...

s. General-purpose ciphers tend to have different design goals. In particular, AES has key and block sizes that make it nontrivial to use to generate long hash values; AES encryption becomes less efficient when the key changes each block; and related-key attacks make it potentially less secure for use in a hash function than for encryption.

Merkle–Damgård construction

A hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a one-way compression function. The compression function can either be specially designed for hashing or be built from a block cipher. A hash function built with the Merkle–Damgård construction is as resistant to collisions as is its compression function; any collision for the full hash function can be traced back to a collision in the compression function.

The last block processed should also be unambiguously length padded
Padding (cryptography)
-Classical cryptography:Official messages often start and end in predictable ways: My dear ambassador, Weather report, Sincerely yours, etc. The primary use of padding with classical ciphers is to prevent the cryptanalyst from using that predictability to find cribs that aid in breaking the...

; this is crucial to the security of this construction. This construction is called the Merkle–Damgård construction. Most widely used hash functions, including SHA-1 and MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

, take this form.

The construction has certain inherent flaws, including length-extension and generate-and-paste attacks, and cannot be parallelized. As a result, many entrants in the current NIST hash function competition
NIST hash function competition
The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...

 are built on different, sometimes novel, constructions.

Use in building other cryptographic primitives

Hash functions can be used to build other cryptographic primitives. For these other primitives to be cryptographically secure, care must be taken to build them correctly.

Message authentication code
Message authentication code
In cryptography, a message authentication code is a short piece of information used to authenticate a message.A MAC algorithm, sometimes called a keyed hash function, accepts as input a secret key and an arbitrary-length message to be authenticated, and outputs a MAC...

s (MACs) (also called keyed hash functions) are often built from hash functions. HMAC
HMAC
In cryptography, HMAC is a specific construction for calculating a message authentication code involving a cryptographic hash function in combination with a secret key. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message...

 is such a MAC.

Just as block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s can be used to build hash functions, hash functions can be used to build block ciphers. Luby-Rackoff constructions using hash functions can be provably secure if the underlying hash function is secure. Also, many hash functions (including SHA-1 and SHA-2
SHA-2
In cryptography, SHA-2 is a set of cryptographic hash functions designed by the National Security Agency and published in 2001 by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm. SHA-2 includes a significant number of changes from its predecessor,...

) are built by using a special-purpose block cipher in a Davies-Meyer or other construction. That cipher can also be used in a conventional mode of operation, without the same security guarantees. See SHACAL
SHACAL
SHACAL-1 is a 160-bit block cipher based on SHA-1, and supports keys from 128-bit to 512-bit. SHACAL-2 is a 256-bit block cipher based upon the larger hash function SHA-256....

, BEAR and LION.

Pseudorandom number generator
Pseudorandom number generator
A pseudorandom number generator , also known as a deterministic random bit generator , is an algorithm for generating a sequence of numbers that approximates the properties of random numbers...

s (PRNGs) can be built using hash functions. This is done by combining a (secret) random seed with a counter and hashing it.

Some hash functions, such as Skein
Skein (hash function)
Skein is a cryptographic hash function and one out of five finalists in the NIST hash function competition to design what will become the SHA-3 standard, the intended successor of SHA-1 and SHA-2...

, Keccak
Keccak
Keccak is a cryptographic hash function designed by Guido Bertoni, Joan Daemen, Michaël Peeters and Gilles Van Assche. Keccak is one of five finalists in the NIST hash function competition to select a SHA-3 algorithm. The authors claim 12.5 cycles per byte on an Intel Core 2 CPU...

, and RadioGatún
RadioGatún
RadioGatún is a cryptographic hash primitive created by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. It was first publicly presented at the NIST Second Cryptographic Hash Workshop, held in Santa Barbara, California, on August 24–25, 2006, as part of the NIST hash function...

 output an arbitrarily long stream and can be used as a stream cipher
Stream cipher
In cryptography, a stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream . In a stream cipher the plaintext digits are encrypted one at a time, and the transformation of successive digits varies during the encryption...

, and stream ciphers can also be built from fixed-length digest hash functions. Often this is done by first building a cryptographically secure pseudorandom number generator
Cryptographically secure pseudorandom number generator
A cryptographically secure pseudo-random number generator is a pseudo-random number generator with properties that make it suitable for use in cryptography.Many aspects of cryptography require random numbers, for example:...

 and then using its stream of random bytes as keystream
Keystream
In cryptography, a keystream is a stream of random or pseudorandom characters that are combined with a plaintext message to produce an encrypted message ....

. SEAL
SEAL (cipher)
In cryptography, SEAL is a very fast stream cipher optimised for machines with a 32-bit word size and plenty of RAM. SEAL is actually a pseudorandom function family in that it can easily generate arbitrary portions of the keystream without having to start from the beginning...

 is a stream cipher that uses SHA-1 to generate internal tables, which are then used in a keystream generator more or less unrelated to the hash algorithm. SEAL is not guaranteed to be as strong (or weak) as SHA-1.

Concatenation of cryptographic hash functions

Concatenating outputs from multiple hash functions provides collision resistance as good as the strongest of the algorithms included in the concatenated result. For example, older versions of TLS/SSL
Transport Layer Security
Transport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...

 use concatenated MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

 and SHA-1 sums; that ensures that a method to find collisions in one of the functions doesn't allow forging traffic protected with both functions.

For Merkle-Damgård hash functions, the concatenated function is as collision-resistant as its strongest component, but not more collision-resistant. Joux noted that 2-collisions lead to n-collisions: if it is feasible to find two messages with the same MD5 hash, it is effectively no more difficult to find as many messages as the attacker desires with identical MD5 hashes. Among the n messages with the same MD5 hash, there is likely to be a collision in SHA-1. The additional work needed to find the SHA-1 collision (beyond the exponential birthday search) is polynomial. This argument is summarized by Finney. A more current paper and full proof of the security of such a combined construction gives a clearer and more complete explanation of the above.

Cryptographic hash algorithms

There is a long list of cryptographic hash functions, although many have been found to be vulnerable and should not be used. Even if a hash function has never been broken, a successful attack against a weakened variant thereof may undermine the experts' confidence and lead to its abandonment. For instance, in August 2004 weaknesses were found in a number of hash functions that were popular at the time, including SHA-0, RIPEMD, and MD5. This has called into question the long-term security of later algorithms which are derived from these hash functions — in particular, SHA-1 (a strengthened version of SHA-0), RIPEMD-128, and RIPEMD-160 (both strengthened versions of RIPEMD). Neither SHA-0 nor RIPEMD are widely used since they were replaced by their strengthened versions.

As of 2009, the two most commonly used cryptographic hash functions are MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

 and SHA-1. However, MD5 has been broken; an attack against it was used to break SSL
Transport Layer Security
Transport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...

 in 2008.

The SHA-0 and SHA-1 hash functions were developed by the NSA. In February 2005, a successful attack on SHA-1 was reported, finding collisions in about 269 hashing operations, rather than the 280 expected for a 160-bit hash function. In August 2005, another successful attack on SHA-1 was reported, finding collisions in 263 operations. Theoretical weaknesses of SHA-1 exist as well, suggesting that it may be practical to break within years. New applications can avoid these problems by using more advanced members of the SHA family, such as SHA-2
SHA-2
In cryptography, SHA-2 is a set of cryptographic hash functions designed by the National Security Agency and published in 2001 by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm. SHA-2 includes a significant number of changes from its predecessor,...

, or using techniques such as randomized hashing that do not require collision resistance.

However, to ensure the long-term robustness of applications that use hash functions, there is a competition
NIST hash function competition
The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...

 to design a replacement for SHA-2, which will be given the name SHA-3 and become a FIPS
Federal Information Processing Standard
A Federal Information Processing Standard is a publicly announced standardization developed by the United States federal government for use in computer systems by all non-military government agencies and by government contractors, when properly invoked and tailored on a contract...

 standard around 2012.

Some of the following algorithms are used often in cryptography; consult the article for each specific algorithm for more information on the status of each algorithm. Note that this list does not include candidates in the current NIST hash function competition. For additional hash functions see the box at the bottom of the page.
Algorithm Output size (bits) Internal state size Block size Length size Word size Collision attack
Collision attack
In cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision...

s (complexity)
Preimage attack
Preimage attack
In cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...

s (complexity)
GOST 256 256 256 256 32 (2105) (2192)
HAVAL
HAVAL
HAVAL is a cryptographic hash function. Unlike MD5, but like most modern cryptographic hash functions, HAVAL can produce hashes of different lengths. HAVAL can produce hashes in lengths of 128 bits, 160 bits, 192 bits, 224 bits, and 256 bits...

256/224/192/160/128 256 1024 64 32
MD2 128 384 128 No 32 (263.3) (273)
MD4
MD4
The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms....

128 128 512 64 32 (3) (270.4)
MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

128 128 512 64 32 (220.96) (2123.4)
PANAMA
PANAMA
Panama is a cryptography primitive which can be used both as a hash function and a stream cipher. Based on StepRightUp, it was designed by Joan Daemen and Craig Clapp and presented in the paper Fast Hashing and Stream Encryption with PANAMA on the Fast Software Encryption conference 1998...

256 8736 256 No 32
RadioGatún
RadioGatún
RadioGatún is a cryptographic hash primitive created by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. It was first publicly presented at the NIST Second Cryptographic Hash Workshop, held in Santa Barbara, California, on August 24–25, 2006, as part of the NIST hash function...

up to 608/1216 (19 words) 58 words 3 words No 1-64 (2352 or 2704)
RIPEMD
RIPEMD
RIPEMD-160 is a 160-bit message digest algorithm developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996...

128 128 512 64 32 (218)
RIPEMD-128/256
RIPEMD
RIPEMD-160 is a 160-bit message digest algorithm developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996...

128/256 128/256 512 64 32 No
RIPEMD-160/320
RIPEMD
RIPEMD-160 is a 160-bit message digest algorithm developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996...

160/320 160/320 512 64 32 No
SHA-0 160 160 512 64 32 (233.6)
SHA-1 160 160 512 64 32 (251) No
SHA-256/224
SHA-2
In cryptography, SHA-2 is a set of cryptographic hash functions designed by the National Security Agency and published in 2001 by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm. SHA-2 includes a significant number of changes from its predecessor,...

256/224 256 512 64 32 No No
SHA-512/384
SHA-2
In cryptography, SHA-2 is a set of cryptographic hash functions designed by the National Security Agency and published in 2001 by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm. SHA-2 includes a significant number of changes from its predecessor,...

512/384 512 1024 128 64 No No
Tiger(2)-192/160/128 192/160/128 192 512 64 64 (262:19) (2184.3)
WHIRLPOOL
WHIRLPOOL
In computer science and cryptography, Whirlpool is a cryptographic hash function designed by Vincent Rijmen and Paulo S. L. M. Barreto first described in 2000. The hash has been recommended by the NESSIE project...

512 512 512 256 8 (http://www.springerlink.com/content/9711452444u24861/)


Note: The internal state here means the "internal hash sum" after each compression of a data block. Most hash algorithms also internally use some additional variables such as length of the data compressed so far since that is needed for the length padding in the end. See the Merkle-Damgård construction for details.

See also

  • Avalanche effect
    Avalanche effect
    In cryptography, the avalanche effect refers to a desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions. The avalanche effect is evident if, when an input is changed slightly the output changes significantly...

  • Comparison of cryptographic hash functions
    Comparison of cryptographic hash functions
    The following tables compare general and technical information for a number of cryptographic hash functions.- General information :Basic general information about the cryptographic hash functions: year, designer, references, etc.- Compression function :...

  • CRYPTREC
    CRYPTREC
    CRYPTREC is the Cryptography Research and Evaluation Committees set up by the Japanese Government to evaluate and recommend cryptographic techniques for government and industrial use...

     and NESSIE
    NESSIE
    NESSIE was a European research project funded from 2000–2003 to identify secure cryptographic primitives. The project was comparable to the NIST AES process and the Japanese Government-sponsored CRYPTREC project, but with notable differences from both...

     - Projects which recommend hash functions
  • Keyed-hash message authentication code
  • MD5CRK
    MD5CRK
    In cryptography, MD5CRK was a distributed effort launched by Jean-Luc Cooke and his company, , to demonstrate that the MD5 message digest algorithm is insecure by finding a collision — two messages that produce the same MD5 hash. The project went live on March 1, 2004...

  • Message authentication code
    Message authentication code
    In cryptography, a message authentication code is a short piece of information used to authenticate a message.A MAC algorithm, sometimes called a keyed hash function, accepts as input a secret key and an arbitrary-length message to be authenticated, and outputs a MAC...


  • PGP word list
    PGP word list
    The PGP Word List is a list of words for conveying data bytes in a clear unambiguous way via a voice channel...

  • Provably secure cryptographic hash function
    Provably secure cryptographic hash function
    In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory and formal reduction. These...

  • SHA-3
    NIST hash function competition
    The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...

  • UOWHF
    UOWHF
    In cryptography a universal one-way hash function , is a type of universal hash function of particular importance to cryptography. UOWHF's are proposed as an alternative to collision-resistant hash functions...

     - Universal One Way Hash Functions
  • Hash chain
    Hash chain
    A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password...


Further reading

  • Bruce Schneier
    Bruce Schneier
    Bruce Schneier is an American cryptographer, computer security specialist, and writer. He is the author of several books on general security topics, computer security and cryptography, and is the founder and chief technology officer of BT Managed Security Solutions, formerly Counterpane Internet...

    . Applied Cryptography. John Wiley & Sons, 1996. ISBN 0-471-12845-7.

  • Christof Paar, Jan Pelzl, "Hash Functions", Chapter 11 of "Understanding Cryptography, A Textbook for Students and Practitioners". (companion web site contains online cryptography course that covers hash functions), Springer, 2009.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK