Variable-length quantity
Encyclopedia
A variable-length quantity (VLQ) is a universal code
Universal code (data compression)
In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic , the expected lengths of the codewords are...

 that uses an arbitrary number of binary
Binary code
A binary code is a way of representing text or computer processor instructions by the use of the binary number system's two-binary digits 0 and 1. This is accomplished by assigning a bit string to each particular symbol or instruction...

 octet
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

s (eight-bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

 byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s) to represent an infinitely large integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

. It was defined for use in the standard MIDI file format to save additional space for a resource constrained system, and is also used in the later Extensible Music Format (XMF)
Extensible Music Format (XMF)
XMF is a tree-based digital container format used to bundle music-oriented content, such as a MIDI file and optionally the sounds it uses, liner notes or other content grouped by language-codes....

. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. See the example below.

Base-128 is also used in ASN.1 BER encoding to encode tag numbers, and it is used in the WAP
Wireless Application Protocol
Wireless Application Protocol is a technical standard for accessing information over a mobile wireless network.A WAP browser is a web browser for mobile devices such as mobile phones that uses the protocol.Before the introduction of WAP, mobile service providers had limited opportunities to offer...

 environment, where it is called variable length unsigned integer or uintvar. The DWARF
DWARF
DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format , although it is independent of object file formats...

 debugging format defines a variant called LEB128
LEB128
LEB128 or Little Endian Base 128 is a form of variable-length code compression used to store an arbitrarily large integer in a small number of bytes. LEB128 is used in the DWARF debug file format.-Encoding format:...

 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits are encoded in the first byte and the most significant bits are in the last byte. Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

's protocol buffers
Protocol Buffers
Protocol Buffers are a serialization format with an interface description language developed by Google. The original Google implementation for C++, Java and Python is available under a free software, open source license....

 use a similar format to have compact representation of integer values.

General structure

The encoding assumes an octet (an eight-bit byte) where the most significant bit (MSB), also commonly known as the sign bit
Sign bit
In computer science, the sign bit is a bit in a computer numbering format that indicates the sign of a number. In IEEE format, the sign bit is the leftmost bit...

, is reserved to indicate whether another VLQ octet follows.
VLQ Octet
7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20
A Bn


If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

B is a 7-bit number [0x00, 0x7F] and n is the position of the VLQ octet where B0 is the least significant
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

. The VLQ octets are arranged most significant first
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 in a stream.

Other variants

In the data format for Unreal Packages used by the Unreal Engine
Unreal Engine
The Unreal Engine is a game engine developed by Epic Games, first illustrated in the 1998 first-person shooter game Unreal. Although primarily developed for first-person shooters, it has been successfully used in a variety of other genres, including stealth, MMORPGs and RPGs...

, a variable length quantity scheme called Compact Indices is used. The only difference in this encoding is that the first VLQ has the sixth binary digit reserved to indicate whether the encoded integer is positive or negative.
First VLQ Octet
7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20
A B Cn


If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

If B is 0, then the VLQ represents a positive integer. If B is 1, then the VLQ represents a negative number.

C is a 6-bit number [0x00, 0x3F] and n is the position of the VLQ octet where C0 is the least significant
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

. The VLQ octets are arranged most significant first
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 in a stream.

Any consecutive VLQ octet follows the general structure.

Examples

Here we work an example with the number 137
137 (number)
137 is the natural number following 136 and preceding 138.-In mathematics :One hundred [and] thirty-seven is the 33rd prime number; the next is 139, with which it comprises a twin prime, and thus 137 is a Chen prime. 137 is an Eisenstein prime with no imaginary part and a real part of the form 3n -...

:
  • Represent the value in binary notation (e.g. 137 as 10001001)
  • Break it up in groups of 7 bits starting from the lowest significant bit (e.g. 137 as 0000001 0001001). This is equivalent to representing the number in base
    Radix
    In mathematical numeral systems, the base or radix for the simplest case is the number of unique digits, including zero, that a positional numeral system uses to represent numbers. For example, for the decimal system the radix is ten, because it uses the ten digits from 0 through 9.In any numeral...

     128.
  • Take the lowest 7 bits and that gives you the least significant byte (0000 1001). This byte comes last.
  • For all the other groups of 7 bits (in the example, this is 000 0001), set the MSB
    Most significant bit
    In computing, the most significant bit is the bit position in a binary number having the greatest value...

     to 1 (which gives 1000 0001 in our example). Thus 137 becomes 1000 0001 0000 1001 where the bits in boldface are something we added. These added bits denote if there is another byte to follow or not. Thus, by definition, the very last byte of a variable length integer will have 0 as its MSB
    Most significant bit
    In computing, the most significant bit is the bit position in a binary number having the greatest value...

    .


The Standard MIDI File format specification gives more examples:
Integer Variable-length quantity
0x00000000 0x00
0x0000007F 0x7F
0x00000080 0x81 0x00
0x00002000 0xC0 0x00
0x00003FFF 0xFF 0x7F
0x00004000 0x81 0x80 0x00
0x001FFFFF 0xFF 0xFF 0x7F
0x00200000 0x81 0x80 0x80 0x00
0x08000000 0xC0 0x80 0x80 0x00
0x0FFFFFFF 0xFF 0xFF 0xFF 0x7F

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK