MD5
Encyclopedia
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function
Cryptographic hash function
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...

 that produces a 128-bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

 (16-byte) hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity
Data integrity
Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...

. However, it has been shown that MD5 is not collision resistant; as such, MD5 is not suitable for applications like SSL
Transport Layer Security
Transport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...

 certificates
Public key certificate
In cryptography, a public key certificate is an electronic document which uses a digital signature to bind a public key with an identity — information such as the name of a person or an organization, their address, and so forth...

 or digital signature
Digital signature
A digital signature or digital signature scheme is a mathematical scheme for demonstrating the authenticity of a digital message or document. A valid digital signature gives a recipient reason to believe that the message was created by a known sender, and that it was not altered in transit...

s that rely on this property. An MD5 hash is typically expressed as a 32-digit hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 number.

MD5 was designed by Ron Rivest
Ron Rivest
Ronald Linn Rivest is a cryptographer. He is the Andrew and Erna Viterbi Professor of Computer Science at MIT's Department of Electrical Engineering and Computer Science and a member of MIT's Computer Science and Artificial Intelligence Laboratory...

 in 1991 to replace an earlier hash function, MD4
MD4
The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms....

. In 1996, a flaw was found with the design of MD5. While it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1 (which has since been found also to be vulnerable). In 2004, more serious flaws were discovered, making further use of the algorithm for security purposes questionable; specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

. Further advances were made in breaking MD5 in 2005, 2006, and 2007. In an attack on MD5 published in December 2008, a group of researchers used this technique to fake SSL certificate validity.

US-CERT says MD5 "should be considered cryptographically broken and unsuitable for further use," and most U.S. government applications now require the SHA-2
SHA-2
In cryptography, SHA-2 is a set of cryptographic hash functions designed by the National Security Agency and published in 2001 by the NIST as a U.S. Federal Information Processing Standard. SHA stands for Secure Hash Algorithm. SHA-2 includes a significant number of changes from its predecessor,...

 family of hash functions.

History and cryptanalysis

MD5 is one in a series of message digest algorithms designed by Professor Ronald Rivest of MIT
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...

 (Rivest, 1994). When analytic work indicated that MD5's predecessor MD4
MD4
The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms....

 was likely to be insecure, MD5 was designed in 1991 to be a secure replacement. (Weaknesses were indeed later found in MD4 by Hans Dobbertin
Hans Dobbertin
Hans Dobbertin, was a German cryptographer who is best known for his work on cryptanalysis of the MD4, MD5, and original RIPEMD hash functions, and for his part in the design of the new version of the RIPEMD hash function...

.)

In 1993, Den Boer and Bosselaers gave an early, although limited, result of finding a "pseudo-collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

" of the MD5 compression function; that is, two different initialization vector
Initialization vector
In cryptography, an initialization vector is a fixed-size input to a cryptographic primitive that is typically required to be random or pseudorandom...

s which produce an identical digest.

In 1996, Dobbertin announced a collision of the compression function of MD5 (Dobbertin, 1996). While this was not an attack on the full MD5 hash function, it was close enough for cryptographers to recommend switching to a replacement, such as SHA-1 or RIPEMD-160.

The size of the hash—128 bits—is small enough to contemplate a birthday attack
Birthday attack
A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties...

. MD5CRK
MD5CRK
In cryptography, MD5CRK was a distributed effort launched by Jean-Luc Cooke and his company, , to demonstrate that the MD5 message digest algorithm is insecure by finding a collision — two messages that produce the same MD5 hash. The project went live on March 1, 2004...

 was a distributed project
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 started in March 2004 with the aim of demonstrating that MD5 is practically insecure by finding a collision using a birthday attack
Birthday attack
A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties...

.

MD5CRK ended shortly after 17 August 2004, when collisions
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

 for the full MD5 were announced by Xiaoyun Wang
Xiaoyun Wang
Wang Xiaoyun is a researcher and professor in the Department of Mathematics and System Science, Shandong University, Shandong, China....

, Dengguo Feng, Xuejia Lai
Xuejia Lai
Xuejia Lai is a cryptographer, currently a professor at Shanghai Jiao Tong University. His notable work includes the design of the block cipher IDEA, the theory of Markov ciphers, and the cryptanalysis of a number of cryptographic hash functions. His book On the Design and Security of Block...

, and Hongbo Yu. Their analytical attack was reported to take only one hour on an IBM p690
IBM p690
The IBM p690 was, at the time of its release in late 2001, the flagship of IBM's high end Unix servers during the POWER4 era of processors. It was built to run IBM AIX Unix, although it is possible to run a version Linux minus some POWER4 specific features. It was discontinued in late 2005.It can...

 cluster.

On 1 March 2005, Arjen Lenstra
Arjen Lenstra
Arjen Klaas Lenstra is a Dutch mathematician. He studied mathematics at the University of Amsterdam.He is currently a professor at the EPFL , in the Laboratory for Cryptologic Algorithms, and...

, Xiaoyun Wang
Xiaoyun Wang
Wang Xiaoyun is a researcher and professor in the Department of Mathematics and System Science, Shandong University, Shandong, China....

, and Benne de Weger demonstrated construction of two X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...

 certificates with different public keys and the same MD5 hash, a demonstrably practical collision. The construction included private keys for both public keys. A few days later, Vlastimil Klima
Vlastimil Klíma
RNDr. Vlastimil Klíma is a Czech cryptologist and computer security expert. He is an author of several works in the field of cryptographic hash functions and digital signatures...

 described an improved algorithm, able to construct MD5 collisions in a few hours on a single notebook computer. On 18 March 2006, Klima published an algorithm that can find a collision within one minute on a single notebook computer, using a method he calls tunneling.

In 2009, the United States Cyber Command
United States Cyber Command
United States Cyber Command is an armed forces sub-unified command subordinate to United States Strategic Command. The command is located in Fort Meade, Maryland and led by General Keith B. Alexander. USCYBERCOM centralizes command of cyberspace operations, organizes existing cyber resources and...

 used an MD5 hash of their mission statement as a part of their official emblem.

On December 24, 2010, Tao Xie and Dengguo Feng announced the first published single-block MD5 collision (two 64-byte messages with the same MD5 hash given in Little endian notation). Previous collision discoveries relied on multi-block attacks. For "security reasons", Xie and Feng did not disclose the new attack method. They have issued a challenge to the cryptographic community, offering a US$ 10,000 reward to the first finder of a different 64-byte collision before January 1, 2013.

In 2011 an informational RFC
Request for Comments
In computer network engineering, a Request for Comments is a memorandum published by the Internet Engineering Task Force describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems.Through the Internet Society, engineers and...

 was approved to update the security considerations in RFC 1321 (MD5) and RFC 2104 (HMAC-MD5).

Security

The security of the MD5 hash function is severely compromised. A collision attack
Collision attack
In cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision...

 exists that can find collisions within seconds on a computer with a 2.6Ghz Pentium4 processor (complexity of 224.1). Further, there is also a chosen-prefix collision attack that can produce a collision for two chosen arbitrarily different inputs within hours, using off-the-shelf computing hardware (complexity 239).
The ability to find collisions has been greatly aided by the use of off-the-shelf GPUs
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

. On an NVIDIA GeForce 8400GS graphics processor, 16-18 million hashes per second can be computed. An NVIDIA GeForce 8800 Ultra can calculate more than 200 million hashes per second.

These hash and collision attacks have been demonstrated in the public in various situations, including colliding document files and digital certificates.

Collision vulnerabilities

In 1996, collisions were found in the compression function of MD5, and Hans Dobbertin
Hans Dobbertin
Hans Dobbertin, was a German cryptographer who is best known for his work on cryptanalysis of the MD4, MD5, and original RIPEMD hash functions, and for his part in the design of the new version of the RIPEMD hash function...

 wrote in the RSA Laboratories technical newsletter, "The presented attack does not yet threaten practical applications of MD5, but it comes rather close ... in the future MD5 should no longer be implemented...where a collision-resistant hash function is required."

In 2005, researchers were able to create pairs of PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 documents and X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...

 certificates with the same hash. Later that year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."

On 30 December 2008, a group of researchers announced at the 25th Chaos Communication Congress
Chaos Communication Congress
The Chaos Communication Congress is an annual meeting of the international hacker scene, organized by the Chaos Computer Club. The congress features a variety of lectures and workshops on technical and political issues....

 how they had used MD5 collisions to create an intermediate certificate authority certificate which appeared to be legitimate when checked via its MD5 hash. The researchers used a cluster of Sony Playstation 3s at the EPFL in Lausanne, Switzerland to change a normal SSL certificate issued by RapidSSL into a working CA certificate for that issuer, which could then be used to create other certificates that would appear to be legitimate and issued by RapidSSL. VeriSign
VeriSign
Verisign, Inc. is an American company based in Dulles, Virginia that operates a diverse array of network infrastructure, including two of the Internet's thirteen root nameservers, the authoritative registry for the .com, .net, and .name generic top-level domains and the .cc and .tv country-code...

, the issuers of RapidSSL certificates, said they stopped issuing new certificates using MD5 as their checksum algorithm for RapidSSL once the vulnerability was announced. Although Verisign declined to revoke existing certificates signed using MD5, their response was considered adequate by the authors of the exploit (Alexander Sotirov
Alexander Sotirov
Alexander Sotirov is a computer security researcher. He has been a researcher at Determina and VMware.He is well known for his discovery of the ANI browser vulnerability as well as the so-called Heap Feng Shui technique for exploiting heap buffer overflows in browsers. In 2008, he presented...

, Marc Stevens, Jacob Appelbaum
Jacob Appelbaum
Jacob Appelbaum is an independent computer security researcher and hacker. He is currently employed by the University of Washington, and is a core member of the Tor project. Appelbaum is known for representing Wikileaks at the 2010 Hope conference...

, Arjen Lenstra
Arjen Lenstra
Arjen Klaas Lenstra is a Dutch mathematician. He studied mathematics at the University of Amsterdam.He is currently a professor at the EPFL , in the Laboratory for Cryptologic Algorithms, and...

, David Molnar, Dag Arne Osvik, and Benne de Weger). Bruce Schneier wrote of the attack that "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore." The SSL researchers wrote, "Our desired impact is that Certification Authorities will stop using MD5 in issuing new certificates. We also hope that use of MD5 in other applications will be reconsidered as well."

MD5 uses the Merkle–Damgård construction, so if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more likely to be accepted as valid data by the application using it. Furthermore, current collision-finding techniques allow to specify an arbitrary prefix: an attacker can create two colliding files that both begin with the same content. All the attacker needs to generate two colliding files is a template file with a 128-byte block of data aligned on a 64-byte boundary that can be changed freely by the collision-finding algorithm.

Here's the hexcode of a sample for an MD5 collision:
d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70

d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70

Both produce the hash value 79054025255fb1a26e4bc422aef54eb4, though they differ in 6 of 1024 bits.

Preimage vulnerability

In April 2009, a preimage attack
Preimage attack
In cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...

 against MD5 was published that breaks MD5's preimage resistance. This attack is only theoretical, with a computational complexity of 2123.4 for full preimage.

Other vulnerabilities

A number of projects have published MD5 rainbow table
Rainbow table
A rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering the plaintext password, up to a certain length consisting of a limited set of characters. It is a form of time-memory tradeoff, using less...

s online, that can be used to reverse many MD5 hashes into strings that collide with the original input, usually for the purposes of password
Password
A password is a secret word or string of characters that is used for authentication, to prove identity or gain access to a resource . The password should be kept secret from those not allowed access....

 cracking.

The use of MD5 in some websites' URLs
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

 means that search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

s such as Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 can also sometimes function as a limited tool for reverse lookup of MD5 hashes.

Both these techniques are rendered ineffective by the use of a sufficiently long salt
Salt (cryptography)
In cryptography, a salt consists of random bits, creating one of the inputs to a one-way function. The other input is usually a password or passphrase. The output of the one-way function can be stored rather than the password, and still be used for authenticating users. The one-way function...

 as explained in the rainbow table article.

Applications

MD5 digests have been widely used in the software world to provide some assurance that a transferred file has arrived intact. For example, file servers often provide a pre-computed MD5 (known as Md5sum
Md5sum
md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash...

) checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

 for the files, so that a user can compare the checksum of the downloaded file to it. Unix-based operating systems include MD5 sum utilities in their distribution packages, whereas Windows users use third-party applications.

However, now that it is easy to generate MD5 collisions, it is possible for the person who created the file to create a second file with the same checksum, so this technique cannot protect against some forms of malicious tampering. Also, in some cases the checksum cannot be trusted (for example, if it was obtained over the same channel as the downloaded file), in which case MD5 can only provide error-checking functionality: it will recognize a corrupt or incomplete download, which becomes more likely when downloading larger files.

MD5 was widely used to store passwords. Sound uses of MD5 include UUID (also known as GUID
Globally Unique Identifier
A globally unique identifier is a unique reference number used as an identifier in computer software. The term GUID also is used for Microsoft's implementation of the Universally unique identifier standard....

) version 3 specified in RFC 4122, CRAM
CRAM-MD5
In cryptography, CRAM-MD5 is achallenge-response authentication mechanism defined in RFC 2195 based on theHMAC-MD5 MACalgorithm...

 specified in RFC 2195, and the IETF
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

 NomCom lottery specified in RFC 3797.

MD5 and other hash functions are also used in the field of electronic discovery
Electronic Discovery
Electronic discovery refers to discovery in civil litigation which deals with the exchange of information in electronic format . Usually a digital forensics analysis is performed to recover evidence...

, in order to provide a unique identifier for each document that is exchanged during the legal discovery process. This method can be used to replace the Bates stamp
Bates numbering
Bates numbering is used in the legal, medical, and business fields to place identifying numbers and/or date/time-marks on images and documents as they are scanned or processed...

 numbering system that has been used for decades during the exchange of paper documents.

Algorithm

MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 integers); the message is padded
Padding (cryptography)
-Classical cryptography:Official messages often start and end in predictable ways: My dear ambassador, Weather report, Sincerely yours, etc. The primary use of padding with classical ciphers is to prevent the cryptanalyst from using that predictability to find cribs that aid in breaking the...

 so that its length is divisible by 512. The padding works as follows: first a single bit, 1, is appended to the end of the message. This is followed by as many zeros as are required to bring the length of the message up to 64 bits fewer than a multiple of 512. The remaining bits are filled up with a 64-bit little endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 integer representing the length of the original message, in bits.

The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit words, denoted A, B, C and D. These are initialized to certain fixed constants. The main algorithm then operates on each 512-bit message block in turn, each block modifying the state. The processing of a message block consists of four similar stages, termed rounds; each round is composed of 16 similar operations based on a non-linear function F, modular addition
Modular Addition
Modular additions are usually side and 2nd story additions to homes that are pre-fabricated at the facilities. General characteristics of a modular home apply. For a 2nd story modular addition the existing house should have a sound structure as modular rooms are 30%+ heavier than the same stick-built...

, and left rotation. Figure 1 illustrates one operation within a round. There are four possible functions F; a different one is used in each round:

denote the XOR, AND
Logical conjunction
In logic and mathematics, a two-place logical operator and, also known as logical conjunction, results in true if both of its operands are true, otherwise the value of false....

, OR
Logical disjunction
In logic and mathematics, a two-place logical connective or, is a logical disjunction, also known as inclusive disjunction or alternation, that results in true whenever one or more of its operands are true. E.g. in this context, "A or B" is true if A is true, or if B is true, or if both A and B are...

 and NOT
Negation
In logic and mathematics, negation, also called logical complement, is an operation on propositions, truth values, or semantic values more generally. Intuitively, the negation of a proposition is true when that proposition is false, and vice versa. In classical logic negation is normally identified...

 operations respectively.

Pseudocode

The MD5 hash is calculated according to this algorithm:

//Note: All variables are unsigned 32 bits and wrap modulo 2^32 when calculating
var int[64] r, k

//r specifies the per-round shift amounts
r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22}
r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20}
r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23}
r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}

//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
k[i] := floor(abs(sin(i + 1)) × (2 pow 32))
end for
//(Or just use the following table):
k[ 0.. 3] := { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee }
k[ 4.. 7] := { 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 }
k[ 8..11] := { 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be }
k[12..15] := { 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 }
k[16..19] := { 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa }
k[20..23] := { 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 }
k[24..27] := { 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed }
k[28..31] := { 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a }
k[32..35] := { 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c }
k[36..39] := { 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 }
k[40..43] := { 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 }
k[44..47] := { 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 }
k[48..51] := { 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 }
k[52..55] := { 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 }
k[56..59] := { 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 }
k[60..63] := { 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }

//Initialize variables:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476

//Pre-processing:
append "1" bit to message
append "0" bits until message length in bits ≡ 448 (mod 512)
append length to message
/* bit (not byte) length of unpadded message as 64-bit little-endian integer */

//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
break chunk into sixteen 32-bit little-endian words w[j], 0 ≤ j ≤ 15
//Initialize hash value for this chunk:
var int a := h0
var int b := h1
var int c := h2
var int d := h3
//Main loop:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
f := (b and c) or ((not b) and d)
g := i
else if 16 ≤ i ≤ 31
f := (d and b) or ((not d) and c)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
f := b xor c xor d
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
f := c xor (b or (not d))
g := (7×i) mod 16
temp := d
d := c
c := b
b := b + leftrotate((a + f + k[i] + w[g]) , r[i])
a := temp
end for
//Add this chunk's hash to result so far:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
end for

var char digest[16] := h0 append h1 append h2 append h3 //(expressed as little-endian)

//leftrotate function definition
leftrotate (x, c)
return (x << c) or (x >> (32-c));

Note: Instead of the formulation from the original RFC 1321 shown, the following may be used for improved efficiency (useful if assembly language is being used - otherwise, the compiler will generally optimize the above code. Since each computation is dependent on another in these formulations, this is often slower than the above method where the nand/and can be parallelised):
(0 ≤ i ≤ 15): f := d xor (b and (c xor d))
(16 ≤ i ≤ 31): f := c xor (d and (b xor c))

MD5 hashes

The 128-bit (16-byte) MD5 hashes (also termed message digests) are typically represented as a sequence of 32 hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 digits. The following demonstrates a 43-byte ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 input and the corresponding MD5 hash:

MD5("The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
"The quick brown fox jumps over the lazy dog" is an English-language pangram, that is, a phrase that contains all of the letters of the alphabet. It has been used to test typewriters and computer keyboards, and in other applications involving all of the letters in the English alphabet...

")
= 9e107d9d372bb6826bd81d3542a419d6

Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the avalanche effect
Avalanche effect
In cryptography, the avalanche effect refers to a desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions. The avalanche effect is evident if, when an input is changed slightly the output changes significantly...

. For example, adding a period to the end of the sentence:

MD5("The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
"The quick brown fox jumps over the lazy dog" is an English-language pangram, that is, a phrase that contains all of the letters of the alphabet. It has been used to test typewriters and computer keyboards, and in other applications involving all of the letters in the English alphabet...

.")
= e4d909c290d0fb1ca068ffaddf22cbd0

The hash of the zero-length string is:

MD5("")
= d41d8cd98f00b204e9800998ecf8427e

The MD5 algorithm is specified for messages consisting of any number of bits; it is not limited to multiples of eight bits (octets
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

, byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s) as shown in the examples above. Some MD5 implementations such as md5sum
Md5sum
md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash...

 might be limited to octets, or they might not support streaming for messages of an initially undetermined length.

See also

  • Comparison of cryptographic hash functions
    Comparison of cryptographic hash functions
    The following tables compare general and technical information for a number of cryptographic hash functions.- General information :Basic general information about the cryptographic hash functions: year, designer, references, etc.- Compression function :...

  • md5deep
    Md5deep
    md5deep is a software package used in the computer security, system administration and computer forensics communities for purposes of running large numbers of files through any of several different cryptographic digests...

  • md5sum
    Md5sum
    md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash...

  • HashClash
    HashClash
    HashClash was a distributed computing project to find collisions in the MD5 hash algorithm. It was based at Department of Mathematics and Computer Science at the Eindhoven University of Technology, and Marc Stevens initiated the project as part of his Master's Degree thesis.The project ended after...

  • MD6
    MD6
    The MD6 Message-Digest Algorithm is a cryptographic hash function. It uses a Merkle tree-like structure to allow for immense parallel computation of hashes for very long inputs...


External links

  • RFC 1321 The MD5 Message-Digest Algorithm
  • RFC 6151 Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms
  • W3C recommendation on MD5
  • REXX MD5 test suite covers MD5 examples in various RFCs (incl. errata)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK