Adler-32
Encyclopedia
Adler-32 is a checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

 algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 which was invented by Mark Adler
Mark Adler
Dr. Mark Adler may be best known for his work in the field of data compression. Adler is the author of the Adler-32 hash function, a co-author of the zlib compression library and gzip, has contributed to Info-ZIP, and has participated in developing the Portable Network Graphics image format...

 in 1995, and is a modification of the Fletcher checksum
Fletcher's checksum
The Fletcher checksum is an algorithm for computing a position-dependent checksum devised by John G. Fletcher at Lawrence Livermore Labs in the late 1970s. A description of the algorithm and an analysis of the performance characteristics of a particular implementation were published in the IEEE...

. Compared to a cyclic redundancy check
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...

 of the same length, it trades reliability for speed. Adler-32 is more reliable than Fletcher-16, and slightly less reliable than Fletcher-32.

History

The Adler-32 checksum is part of the widely-used zlib
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

 compression library, as both were developed by Mark Adler
Mark Adler
Dr. Mark Adler may be best known for his work in the field of data compression. Adler is the author of the Adler-32 hash function, a co-author of the zlib compression library and gzip, has contributed to Info-ZIP, and has participated in developing the Portable Network Graphics image format...

.
A "rolling checksum" version of Adler-32 is used in the rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...

 utility.

The algorithm

An Adler-32 checksum is obtained by calculating two 16-bit
16-bit
-16-bit architecture:The HP BPC, introduced in 1975, was the world's first 16-bit microprocessor. Prominent 16-bit processors include the PDP-11, Intel 8086, Intel 80286 and the WDC 65C816. The Intel 8088 was program-compatible with the Intel 8086, and was 16-bit in that its registers were 16...

 checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s in the stream plus one, and B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

 65521 (the largest prime number
Prime number
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example 5 is prime, as only 1 and 5 divide it, whereas 6 is composite, since it has the divisors 2...

 smaller than 216). The bytes are stored in network order (big endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

), B occupying the two most significant bytes.

The function may be expressed as

A = 1 + D1 + D2 + ... + Dn (mod 65521)
B = (1 + D1) + (1 + D1 + D2) + ... + (1 + D1 + D2 + ... + Dn) (mod 65521)
= n×D1 + (n−1)×D2 + (n−2)×D3 + ... + Dn + n (mod 65521)

Adler-32(D) = B × 65536 + A

where D is the string of bytes for which the checksum is to be calculated, and n is the length of D.

Example

The Adler-32 sum of the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 string "Wikipedia" would be calculated as follows:
Character ASCII code A B
(shown as base 10)
W 87 1 + 87 = 88 0 + 88 = 88
i 105 88 + 105 = 193 88 + 193 = 281
k 107 193 + 107 = 300 281 + 300 = 581
i 105 300 + 105 = 405 581 + 405 = 986
p 112 405 + 112 = 517 986 + 517 = 1503
e 101 517 + 101 = 618 1503 + 618 = 2121
d 100 618 + 100 = 718 2121 + 718 = 2839
i 105 718 + 105 = 823 2839 + 823 = 3662
a 97 823 + 97 = 920 3662 + 920 = 4582


A = 920 = 398 hex
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 (base 16)
B = 4582 = 11E6 hex

Output = 4,582 × 65,536 + 920 = 300286872 = 11E60398 hex

The modulo operation had no effect in this example, since none of the values reached 65521.

Comparison with the Fletcher checksum

The first difference between the two algorithms is that Adler-32 sums are calculated modulo a prime number, whereas Fletcher sums are calculated modulo 24−1, 28−1, or 216−1 (depending on the number of bits used), which are all composite number
Composite number
A composite number is a positive integer which has a positive divisor other than one or itself. In other words a composite number is any positive integer greater than one that is not a prime number....

s. Using a prime number makes it possible for Adler-32 to catch differences in certain combinations of bytes that Fletcher is unable to detect.

The second difference, which has the largest effect on the speed of the algorithm, is that the Adler sums are computed over 8-bit byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s rather than 16-bit words, resulting in twice the number of loop iterations. This results in the Adler-32 checksum taking between one-and-a-half to two times as long as Fletcher's checksum for 16-bit word aligned data. For byte-aligned data, Adler-32 is faster than a properly implemented Fletcher's checksum (e.g., one found in the Hierarchical Data Format
Hierarchical Data Format
Hierarchical Data Format is the name of a set of file formats and libraries designed to store and organize large amounts of numerical data...

).

Example implementation

In C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, an inefficient but straightforward implementation is :
  1. define MOD_ADLER 65521


unsigned long adler32(unsigned char *data, int len) /* where data is the location of the data in physical memory and
len is the length of the data in bytes */
{
unsigned long a = 1, b = 0;
int index;

/* Process each byte of the data in order */
for (index = 0; index < len; ++index)
{
a = (a + data[index]) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}

return (b << 16) | a;
}

See the zlib
Zlib
zlib is a software library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. Zlib is also a crucial component of many software platforms including Linux, Mac OS X,...

 source code for a more efficient implementation that requires a fetch and two additions per byte, with the modulo operations deferred with two remainders computed every several thousand bytes.

Advantages and disadvantages

  • Like the standard CRC-32, the Adler-32 checksum can be forged easily and is therefore unsafe for protecting against intentional modification.
  • It's faster than CRC-32 on many platforms.
  • Adler-32 has a weakness for short messages with few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits.

Weakness

Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of
CRC32 instead of Adler-32 for SCTP
Stream Control Transmission Protocol
In computer networking, the Stream Control Transmission Protocol is a Transport Layer protocol, serving in a similar role to the popular protocols Transmission Control Protocol and User Datagram Protocol...

, the Stream Control Transmission Protocol.

External links

  • RFC 1950 – specification, contains example C
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

    code
  • ZLib – implements the Adler-32 checksum
  • RFC 3309 – information about the short message weakness and related change to SCTP
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK