Binary GCD algorithm
Encyclopedia
The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor
Greatest common divisor
In mathematics, the greatest common divisor , also known as the greatest common factor , or highest common factor , of two or more non-zero integers, is the largest positive integer that divides the numbers without a remainder.For example, the GCD of 8 and 12 is 4.This notion can be extended to...

 of two nonnegative integers. It gains a measure of efficiency over the ancient Euclidean algorithm
Euclidean algorithm
In mathematics, the Euclidean algorithm is an efficient method for computing the greatest common divisor of two integers, also known as the greatest common factor or highest common factor...

 by replacing divisions and multiplications with shifts, which are cheaper when operating on the binary representation used by modern computers. This is particularly critical on embedded platforms that have no direct processor support for division. Although the algorithm was first published by the Israeli physicist and programmer Josef Stein in 1967, it may have been known in 1st-century China.

Algorithm

The algorithm reduces the problem of finding the GCD by repeatedly applying these identities:
  1. gcd(0, v) = v, because everything divides zero, and v is the largest number that divides v. Similarly, gcd(u, 0) = u. gcd(0, 0) is not typically defined, but it is convenient to set gcd(0, 0) = 0.
  2. If u and v are both even, then gcd(uv) = 2·gcd(u/2, v/2), because 2 is a common divisor.
  3. If u is even and v is odd, then gcd(uv) = gcd(u/2, v), because 2 is not a common divisor. Similarly, if u is odd and v is even, then gcd(uv) = gcd(uv/2).
  4. If u and v are both odd, and u ≥ v, then gcd(uv) = gcd((u − v)/2, v). If both are odd and u < v, then gcd(uv) = gcd((v − u)/2, u). These are combinations of one step of the simple Euclidean algorithm, which uses subtraction at each step, and an application of step 3 above. The division by 2 results in an integer because the difference of two odd numbers is even.
  5. Repeat steps 3–4 (an odd argument is always retained) until u = v, or (one more step) until u = 0. In either case, the GCD is 2kv, where k is the number of common factors of 2 found in step 2.


Since this definition is tail-recursive, a loop can be used to replace the recursion.

The algorithm requires O((log2 uv)2) worst-case time, or in other words time proportional to the square of the number of bits in u and v together. Although each step reduces at least one of the operands by at least a factor of 2, the subtract and shift operations do not take constant time for very large integers (although they're still quite fast in practice, requiring about one operation per word of the representation).

An extended version of binary GCD, analogous to the extended Euclidean algorithm
Extended Euclidean algorithm
The extended Euclidean algorithm is an extension to the Euclidean algorithm. Besides finding the greatest common divisor of integers a and b, as the Euclidean algorithm does, it also finds integers x and y that satisfy Bézout's identityThe extended Euclidean algorithm is particularly useful when a...

, is given in The Art of Computer Programming
The Art of Computer Programming
The Art of Computer Programming is a comprehensive monograph written by Donald Knuth that covers many kinds of programming algorithms and their analysis....

, along with pointers to other versions.

Implementation in C

Following is an implementation of the algorithm in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, taking two (non-negative) integer arguments u and v. It first removes all common factors of 2 using identity 2, then computes the GCD of the remaining numbers using identities 3 and 4, and combines these to form the final answer.

typedef unsigned long long uint64;
uint64 gcd(uint64 u, uint64 v)
{
int shift;

/* GCD(0,x) := x */
if (u

0 || v

0)
return u | v;

/* Let shift := lg K, where K is the greatest power of 2
dividing both u and v. */
for (shift = 0; ((u | v) & 1)

0; ++shift) {
u >>= 1;
v >>= 1;
}

while ((u & 1)

0)
u >>= 1;

/* From here on, u is always odd. */
do {
while ((v & 1) 0) /* Loop X */
v >>= 1;

/* Now u and v are both odd, so diff(u, v) is even.
Let u = min(u, v), v = diff(u, v)/2. */
if (u < v) {
v -= u;
} else {
uint64 diff = u - v;
u = v;
v = diff;
}
v >>= 1;
} while (v != 0);

return u << shift;
}


An optimized variation using GCC intrinsics may look like:

uint32_t gcd(uint32_t c, uint32_t d)
{
uint32_t u, v, s;

if (c 0 || d 0) { /* avoid infinite loops */
return c | d;
}

u = __builtin_ctz(c);
v = __builtin_ctz(d);

s = (u < v) ? u : v;

u = c >> u;
v = d >> v;

while (u != v) {
if (u > v) {
u -= v;
u >>= __builtin_ctz(u);
} else {
v -= u;
v >>= __builtin_ctz(v);
}
}

return u << s;
}

Implementation in C++

Following is an implementation of the algorithm in C++, much like the implementation in the previous example it takes two arguments u and v. This version also uses recursion
Recursion (computer science)
Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science....

 to achieve a final answer.


int gcd(unsigned int u, unsigned int v){
if(u v || u

0 || v

0)
return u|v;
if(u%2

0){ // if u is even
if(v%2

0) // if u and v are even
return (2*gcd(u/2, v/2));
else // u is even and v is odd
return gcd(u/2, v);
}
else if(v%2 0) // if u is odd and v is even
return gcd(u, v/2);
else{ // both are odd
if(u > v)
return gcd((u-v)/2, v);
else
return gcd((v-u)/2, u);
}
}

Implementation in assembly

This version of binary GCD in ARM
ARM architecture
ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

 assembly
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

 (using GNU Assembler
GNU Assembler
The GNU Assembler, commonly known as GAS , is the assembler used by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software. It is a part of the GNU Binutils package.GAS' executable is named after as, a...

 syntax) highlights the benefit of branch predication
Branch predication
Branch predication is a strategy in computer architecture design for mitigating the costs usually associated with conditional branches, particularly branches to short sections of code...

, showing that the advantage of binary GCD over the Euclidean algorithm lies in its optimizability for real-world machines. The loop to implement step 2 consists of three instructions, all predicated. Step 3 consists of two loops, each 2 instructions long (one of the instructions being predicated); however, after the first iteration r0 is kept odd and need not be tested, and only one of the loops is executed. (On cores that implement the clz instruction, steps 2 and 3 can be completed without looping.) Finally, step 4 takes four instructions of which 2 are predicated.

Since u and v are guaranteed even or odd at certain points, their least significant bits (LSBs) can be considered part of the program state and need not be stored in the registers. The evenness tests then become a side effect of the bit shifts, since the discarded LSB can be stored into the carry flag
Carry flag
In computer processors the carry flag is a single bit in a system status register used to indicate when an arithmetic carry or borrow has been generated out of the most significant ALU bit position...

 for testing. Thus the code works with u/2 and v/2, which are known at the start of each loop to be even or odd.

@ Arguments arrive in registers r0 and r1
gcd:
subs r3, r0, r0 @ Power-of-two counter = 0, carry flag = 1
orrs r2, r0, r1 @ Logical-OR r0 and r1, set flags on result
@ Carry flag remains set. If r0 and r1 are
@ both zero, this loop does nothing and the
@ code exits with r0 = 0.
remove_twos_loop:
movnes r2, r2, lsr #1 @ Shift r2 right if >0, carry flag = LSB
addcc r3, r3, #1 @ If the LSB was 0 then add 1 to the counter
bcc remove_twos_loop @ and loop to try the next bit (terminates)
movs r0, r0, lsr r3 @ else divide r0 by 2^r3 and test result
movnes r1, r1, lsr r3 @ if r0 > 0 divide r1 by 2^r3 and test result
beq finish @ if either is zero sum inputs and exit
@ lsl r3 at finish undoes movs above

@ Now the LSB of either r0 or r1 is 1,
@ and u and v are considered to be even.
@ But starting when we reach the subs below,
@ u > 0; v > 0; r0 = u / 2; r1 = v / 2.
@ The LSBs of u and v are tested in the carry
@ flag, then memorized by the program state.
check_two_r0:
movs r0, r0, lsr #1 @ divide u by 2 by shifting r0 right
bcc check_two_r0 @ repeat until u is odd (loop terminates)
check_two_r1: @ Loop X:
movs r1, r1, lsr #1 @ divide v by 2 by shifting r2 right
bcc check_two_r1 @ repeat until v is odd (loop terminates)

@ u0 := 2 * r0 + 1, v0 := 2 * r1 + 1
subs r1, r1, r0 @ v := v0 - u0, even and possibly zero
addcc r0, r0, r1 @ if v0 < u0, u := u + v, i.e. u := u0 + (v0 - u0)
@ i.e. u := min(u0, v0), odd since both u0 and v0 were
rsbcc r1, r1, #0 @ if v0 < u0, v := 0 - v, i.e. v := |v0 - u0|
bne check_two_r1 @ if v > 0, we need to make it odd
@ shift bits out of r1 up to the first 1 bit.

@ if v = 0, the carry flag is set (from subs)
adc r0, r0, r0 @ Restore u; u = 2(r0) + 1
finish:
orr r0, r1, r0, lsl r3 @ multiply u by 2^r3 by shifting left
bx lr @ return to caller with result in r0.

Efficiency


Brigitte Vallée
Brigitte Vallée
Brigitte Vallée is a French mathematician and computer scientist. She entered the École Normale Supérieure de Jeunes Filles in 1970, and received her PhD in 1986 at the University of Caen...

 proved that binary GCD can be about 60% more efficient (in terms of the number of bit operations) on average than the Euclidean algorithm.http://users.info.unicaen.fr/~brigitte/Publications/icalp8-2000.pshttp://web.comlab.ox.ac.uk/oucl/work/richard.brent/ftp/rpb183pr.ps.gz. However, although this algorithm outperforms the traditional Euclidean algorithm, its asymptotic performance is the same, and it is considerably more complex thanks to the availability of division instruction in all modern microprocessors.

In addition, real computers operate on more than one bit at a time, and even assembly language binary GCD implementations have to compete against carefully designed hardware circuits for integer division. Overall, Knuth (1998) reports a 15% gain over Euclidean GCD, and according to one comparison, the greatest gain was about 60%, while on some popular architectures even good implementations of binary GCD were marginally slower than the Euclidean algorithm.

In general, with implementations of binary GCD similar to the above C code, the gain in speed over the Euclidean algorithm is always less in practice than in theory. The reason is that the code features a plethora of data-dependent branches. Most may be removed either using conditional instructions along the model of the ARM code, or by computing min(a,b) and |a-b| using mixtures of Boolean algebra and arithmetic
Hacker's Delight
Hacker's Delight by Henry S. Warren, Jr., is a book published by Addison-Wesley Professional July 17, 2002. It discusses a variety of programming algorithms for common tasks involving integer types, often with the aim of performing the minimum number of operations or replacing slow operations by...

.

The only one that these techniques do not remove is the loop condition marked Loop X, which can be unrolled with the aid of a lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

. With a 256-byte lookup table (8 bits), the implementation turned to be between 82.5% and 163% faster than the Euclidean algorithm, depending on CPU and compiler. Even with a small 16-byte lookup table (4 bits), the gains are in the range of 54% to 116%. The lookup-table approach finds its logical conclusion in the use of the special "CTZ" instruction, that count leading or trailing binary zeros in a number, allowing all trailing zero bits to be removed in a single step instead of one at a time. Of course, this optimization is possible only on platforms where such instructions are available.

In ancient Chinese mathematics book

An algorithm for computing the GCD of two numbers was described in the ancient Chinese mathematics book The Nine Chapters on the Mathematical Art
The Nine Chapters on the Mathematical Art
The Nine Chapters on the Mathematical Art is a Chinese mathematics book, composed by several generations of scholars from the 10th–2nd century BCE, its latest stage being from the 1st century CE...

. The original algorithm was used to reduce a fraction. The description reads:

"If possible halve it; otherwise, take the denominator and the numerator, subtract the lesser from the greater, and do that alternately to make them the same. Reduce by the same number."

This just looks like a normal Euclidian algorithm, but the ambiguity lies in the phrase "if possible halve it". The traditional interpretation is that this only works when 'both' numbers to begin with are even, under which the algorithm is just a slightly inferior Euclidean algorithm (for using subtraction instead of division). But the phrase may as well mean dividing by 2 should 'either' of the numbers become even, in which case it is the binary GCD algorithm.
External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK