Hamming weight
Encyclopedia
The Hamming weight of a string
is the number of symbols that are different from the zero-symbol of the alphabet
used. It is thus equivalent to the Hamming distance
from the all-zero string of the same length. For the most typical case, a string of bit
s, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum. It is the digit sum
of the binary representation
of a given number.
. It is used in several disciplines including information theory
, coding theory
, and cryptography
.
Examples of applications of the Hamming weight include:
of two words A and B can be calculated as the Hamming weight of A xor B.
The problem of how to implement it efficiently has been widely studied. Some processors have a single command to calculate it (see below), and some have parallel operations on bit vectors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number A=0110110010111010, these operations can be done:
Here, the operations are as in C
, so X >> Y means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here:
The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As described, the bitwise and
of x with x − 1 differs from x only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If x originally had n bits that were 1, then after only n iterations of this operation, x will be reduced to zero. The following implementation is based on this principle.
If we are allowed greater memory usage, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer.
(since version 3.4 in April 2004) includes a builtin function
has included this function since version 1.5 in June, 2005.
In C++ STL, the bit-array data structure
In Java, the growable bit-array data structure has a method that counts the number of bits that are set. In addition, there are and functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the arbitrary-precision integer class also has a method that counts bits.
In Common Lisp, the function logcount, given a non-negative integer, returns the number of 1 bits. (For negative integers it returns the number of 0 bits in 2's complement notation.) In either case the integer can be a BIGNUM.
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
is the number of symbols that are different from the zero-symbol of the alphabet
Alphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...
used. It is thus equivalent to the Hamming distance
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...
from the all-zero string of the same length. For the most typical case, a string of bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum. It is the digit sum
Digit sum
In mathematics, the digit sum of a given integer is the sum of all its digits,...
of the binary representation
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...
of a given number.
Examples
string | Hamming weight |
11101 | 4 |
11101000 | 4 |
00000000 | 0 |
hello world | 11 |
History and usage
The Hamming weight is named after Richard HammingRichard Hamming
Richard Wesley Hamming was an American mathematician whose work had many implications for computer science and telecommunications...
. It is used in several disciplines including information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, coding theory
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...
, and cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
.
Examples of applications of the Hamming weight include:
- In modular exponentiation by squaringExponentiation by squaringExponentiating by squaring is a general method for fast computation of large integer powers of a number. Some variants are commonly referred to as square-and-multiply algorithms or binary exponentiation. In additive notation the appropriate term is double-and-add...
, the number of modular multiplications required for an exponent e is log2 e + weight(e). This is the reason that the public key value e used in RSA is typically chosen to be a number of low Hamming weight. - The Hamming weight determines path lengths between nodes in Chord distributed hash tables.
- IrisCode lookups in biometric databases are typically implemented by calculating the Hamming distanceHamming distanceIn information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...
to each stored record. - In computer chessComputer chessComputer chess is computer architecture encompassing hardware and software capable of playing chess autonomously without human guidance. Computer chess acts as solo entertainment , as aids to chess analysis, for computer chess competitions, and as research to provide insights into human...
programs using a bitboardBitboardA bitboard is a data structure commonly used in computer systems that play board games.A bitboard, often used for boardgames such as chess, checkers and othello, is a specialization of the bitset data structure, where each bit represents a game position or state, designed for optimization of speed...
representation, the Hamming weight of a bitboard gives the number of pieces of a given type remaining in the game, or the number of squares of the board controlled by one player's pieces, and is therefore an important contributing term to the value of a position.
Efficient implementation
The population count of a bitstring is often needed in cryptography and other applications. The Hamming distanceHamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...
of two words A and B can be calculated as the Hamming weight of A xor B.
The problem of how to implement it efficiently has been widely studied. Some processors have a single command to calculate it (see below), and some have parallel operations on bit vectors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number A=0110110010111010, these operations can be done:
Expression | Binary | Decimal | Comment |
A | 01 10 11 00 10 11 10 10 | The original number | |
B = A & 01 01 01 01 01 01 01 01 | 01 00 01 00 00 01 00 00 | 1,0,1,0,0,1,0,0 | every other bit from A |
C = (A >> 1) & 01 01 01 01 01 01 01 01 | 00 01 01 00 01 01 01 01 | 0,1,1,0,1,1,1,1 | the remaining bits from A |
D = B + C | 01 01 10 00 01 10 01 01 | 1,1,2,0,1,2,1,1 | list giving # of 1s in each 2-bit piece of A |
E = D & 0011 0011 0011 0011 | 0001 0000 0010 0001 | 1,0,2,1 | every other count from D |
F = (D >> 2) & 0011 0011 0011 0011 | 0001 0010 0001 0001 | 1,2,1,1 | the remaining counts from D |
G = E + F | 0010 0010 0011 0010 | 2,2,3,2 | list giving # of 1s in each 4-bit piece of A |
H = G & 00001111 00001111 | 00000010 00000010 | 2,2 | every other count from G |
I = (G >> 4) & 00001111 00001111 | 00000010 00000011 | 2,3 | the remaining counts from G |
J = H + I | 00000100 00000101 | 4,5 | list giving # of 1s in each 8-bit piece of A |
K = J & 0000000011111111 | 0000000000000101 | 5 | every other count from J |
L = (J >> 8) & 0000000011111111 | 0000000000000100 | 4 | the remaining counts from J |
M = K + L | 0000000000001001 | 9 | the final answer |
Here, the operations are as in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, so X >> Y means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here:
The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As described, the bitwise and
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...
of x with x − 1 differs from x only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If x originally had n bits that were 1, then after only n iterations of this operation, x will be reduced to zero. The following implementation is based on this principle.
If we are allowed greater memory usage, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer.
Language support
Some C compilers provide intrinsics that provide bit counting facilities. For example, GCCGNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
(since version 3.4 in April 2004) includes a builtin function
__builtin_popcount
that will use a processor instruction if available or an efficient library implementation otherwise. LLVM-GCCLow Level Virtual Machine
The Low Level Virtual Machine is a compiler infrastructure written in C++ that is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages...
has included this function since version 1.5 in June, 2005.
In C++ STL, the bit-array data structure
bitset
has a count
method that counts the number of bits that are set.In Java, the growable bit-array data structure has a method that counts the number of bits that are set. In addition, there are and functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the arbitrary-precision integer class also has a method that counts bits.
In Common Lisp, the function logcount, given a non-negative integer, returns the number of 1 bits. (For negative integers it returns the number of 0 bits in 2's complement notation.) In either case the integer can be a BIGNUM.
Processor support
- CrayCrayCray Inc. is an American supercomputer manufacturer based in Seattle, Washington. The company's predecessor, Cray Research, Inc. , was founded in 1972 by computer designer Seymour Cray. Seymour Cray went on to form the spin-off Cray Computer Corporation , in 1989, which went bankrupt in 1995,...
supercomputers early on featured a population count machine instruction, rumoured to have been specifically requested by the U.S. government National Security AgencyNational Security AgencyThe National Security Agency/Central Security Service is a cryptologic intelligence agency of the United States Department of Defense responsible for the collection and analysis of foreign communications and foreign signals intelligence, as well as protecting U.S...
for cryptanalysisCryptanalysisCryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...
applications. - AMD's BarcelonaAMD K10The AMD Family 10h is a microprocessor microarchitecture by AMD. Though there were once reports that the K10 had been canceled, the first third-generation Opteron products for servers were launched on September 10, 2007, with the Phenom processors for desktops following and launching on November...
architecture introduced the abm (advanced bit manipulation) ISAInstruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
introducing the POPCNT instruction as part of the SSE4a extensions. - Intel CoreIntel CoreYonah was the code name for Intel's first generation of 65 nm process mobile microprocessors, based on the Banias/Dothan-core Pentium M microarchitecture. SIMD performance has been improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations, while integer...
processors introduced a POPCNT instruction with the SSE4.2 instruction setInstruction setAn instruction set, or instruction set architecture , is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O...
extension, first available in a Nehalem-based Core i7 processor, released in November 2008. - CompaqCompaqCompaq Computer Corporation is a personal computer company founded in 1982. Once the largest supplier of personal computing systems in the world, Compaq existed as an independent corporation until 2002, when it was acquired for US$25 billion by Hewlett-Packard....
's Alpha 21264A, released in 1999, was the first Alpha series CPU design that had the count extension (CIX). - Donald KnuthDonald KnuthDonald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...
's model computer MMIXMMIXMMIX is a 64-bit RISC instruction set architecture designed by Donald Knuth, with significant contributions by John L. Hennessy and Richard L. Sites...
that is going to replace MIXMIXMIX is a hypothetical computer used in Donald Knuth's monograph, The Art of Computer Programming . MIX's model number is 1009, which was derived by combining the model numbers and names of several contemporaneous, commercial machines deemed significant by the author...
in his book The Art of Computer ProgrammingThe Art of Computer ProgrammingThe Art of Computer Programming is a comprehensive monograph written by Donald Knuth that covers many kinds of programming algorithms and their analysis....
has anSADD
instruction.SADD a,b,c
counts all bits that are 1 in b and 0 in c and writes the result to a.
External links
- Aggregate Magic Algorithms. Optimized population count and other algorithms explained with sample code.
- HACKMEM item 169. Population count assembly code for the PDP/6-10.
- Bit Twiddling Hacks Several algorithms with code for counting bits set.
- Necessary and Sufficient - by Damien Wintour - Has code in C# for various Hamming Weight implementations.
- Best algorithm to count the number of set bits in a 32-bit integer? - Stackoverflow