LEB128 - AbsoluteAstronomy.com

LEB128 or Little Endian Base 128 is a form of variable-length code

Variable-length code

In coding theory a variable-length code is a code which maps source symbols to a variable number of bits.Variable-length codes can allow sources to be compressed and decompressed with zero error and still be read back symbol by symbol...

compression used to store an arbitrarily large integer in a small number of bytes. LEB128 is used in the DWARF

DWARF

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format , although it is independent of object file formats...

debug file format. name=dwarfspec2> name=dwarfspec3>

Encoding format

LEB128 format is very similar to variable-length quantity

Variable-length quantity

A variable-length quantity is a universal code that uses an arbitrary number of binary octets to represent an infinitely large integer. It was defined for use in the standard MIDI file format to save additional space for a resource constrained system, and is also used in the later Extensible...

format. Both allow small numbers to be stored in a single byte, while also allowing encoding of arbitrarily long numbers. There are 2 versions of LEB128: unsigned LEB128 and signed LEB128. The decoder must know whether the encoded value is unsigned LEB128 or signed LEB128.

Unsigned LEB128

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend

Sign extension

Sign extension is the operation, in computer arithmetic, of increasing the number of bits of a binary number while preserving the number's sign and value...

the number up to a multiple of 7 bits (such that the most significant 7 bits are not all 0). Break the number up into groups of 7 bits. Output one encoded byte for each 7 bit group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. The number zero is encoded as a single byte 0x00.

As an example, here is how the unsigned number 624485 gets encoded:
10011000011101100101 In raw binary
010011000011101100101 Padded to a multiple of 7 bits
0100110 0001110 1100101 Split into 7-bit groups
00100110 10001110 11100101 Add high 1 bits on all but last group to form bytes
0x26 0x8E 0xE5 In hexadecimal
0xE5 0x8E 0x26 Output stream

Unsigned LEB128, VLQ (variable-length quantity

Variable-length quantity

), and the M=128 Rice code all compress any given integer into not only the same number of bits, but exactly the same bits -- the three formats differ only in exactly how those bits are arranged.

Signed LEB128

A signed number is represented similarly, except that the two's complement

Two's complement

The two's complement of a binary number is defined as the value obtained by subtracting the number from a large power of two...

number is sign extended

Sign extension

Sign extension is the operation, in computer arithmetic, of increasing the number of bits of a binary number while preserving the number's sign and value...

up to a multiple of 7 bits (ensuring that the most significant bit is zero for a positive number and one for a negative number). Then the number is broken into groups as for the unsigned encoding.

For example the signed number -624485 (0xFFF6789b) is encoded as 0x9b 0xf1 0x59.

Encode unsigned integer



do {

  byte = low order 7 bits of value;

  value >>= 7;

  if (value != 0) /* more bytes to come */

    set high order bit of byte;

  emit byte;

} while (value != 0);

Encode signed integer



more = 1;

negative = (value < 0);

size = no. of bits in signed integer;

while(more) {

  byte = low order 7 bits of value;

  value >>= 7;

  /* the following is unnecessary if the * implementation of >>= uses an

     arithmetic rather * than logical shift for a signed left operand */

  if (negative)

    value |= - (1 <<(size - 7)); /* sign extend */



  /* sign bit of byte is second high order bit (0x40) */

  if ((value  0 && sign bit of byte is clear) || (value 
 -1 && sign bit of byte is set))

    more = 0;

  else

    set high order bit of byte;

  emit byte;

}

Decode unsigned integer



result = 0;

shift = 0;

while(true) {

  byte = next byte in input;

  result |= (low order 7 bits of byte << shift);

  if (high order bit of byte  0)

    break;

  shift += 7;

}

Decode signed integer



result = 0;

shift = 0;

size = number of bits in signed integer;

while(true) {

  byte = next byte in input;

  result |= (low order 7 bits of byte << shift);

  shift += 7;

  /* sign bit of byte is second high order bit (0x40) */

  if (high order bit of byte  0)

    break;

}



if ((shift 
  /* sign extend */

  result |= - (1 << shift);

Uses

The DWARF

DWARF

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format , although it is independent of object file formats...

file format uses both unsigned and signed LEB128 encoding for various fields.

The mpatrol debugging tool uses LEB128 in its tracing file format. name=tracefile>

The Android project uses LEB128 in its Dalvik Executable Format (.dex) file format. name=dex>

Compressing tables in Hewlett-Packard IA-64 exception handling. name=Christophe>

It is used in the Linux kernel for its DWARF implementation.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.