Range encoding

Encyclopedia

**Range encoding**is a data compression

Data compression

In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

method defined by G. Nigel N. Martin in a 1979 paper Range encoding is a form of arithmetic coding

Arithmetic coding

Arithmetic coding is a form of variable-length entropy encoding used in lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code...

that was historically of interest for avoiding some patent

Software patent

Software patent does not have a universally accepted definition. One definition suggested by the Foundation for a Free Information Infrastructure is that a software patent is a "patent on any performance of a computer realised by means of a computer program".In 2005, the European Patent Office...

s on particular later-developed arithmetic coding techniques (although the patents on various well-known arithmetic coding methods have since expired). Encoders that are referred to as

*range encoders*typically use a particular kind of implementation based closely on Martin's paper, because such implementations could be shown to be "prior art" with respect to some later patents on other arithmetic coding techniques (on the basis of the age of Martin's publication). After the expiration of the first (1978) arithmetic coding patent, range encoding appeared to clearly be free of patent encumbrances. This particularly drove interest in the technique in the open source

Open source

The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

community. Since that time, patents on various other arithmetic coding techniques have also expired.

## How range encoding works

Range encoding conceptually encodes all the symbols of the message into one number, unlike Huffman codingHuffman coding

In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...

which assigns each symbol a bit-pattern and concatenates all the bit-patterns together. Thus range encoding can achieve greater compression ratios than the one-bit-per-symbol lower bound on Huffman encoding and it does not suffer the inefficiencies that Huffman does when dealing with probabilities that are not exact powers of two

Power of two

In mathematics, a power of two means a number of the form 2n where n is an integer, i.e. the result of exponentiation with as base the number two and as exponent the integer n....

.

The central concept behind range encoding is this: given a large-enough range of integer

Integer

The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s, and a probability estimation for the symbols, the initial range can easily be divided into sub-ranges whose sizes are proportional to the probability of the symbol they represent. Each symbol of the message can then be encoded in turn, by reducing the current range down to just that sub-range which corresponds to the next symbol to be encoded. The decoder must have the same probability estimation the encoder used, which can either be sent in advance, derived from already transferred data or be part of the compressor and decompressor.

When all symbols have been encoded, merely identifying the sub-range is enough to communicate the entire message (presuming of course that the decoder is somehow notified when it has extracted the entire message). A single integer is actually sufficient to identify the sub-range, and it may not even be necessary to transmit the entire integer; if there is a sequence of digits such that every integer beginning with that prefix falls within the sub-range, then the prefix alone is all that's needed to identify the sub-range and thus transmit the message.

### Example

Suppose we want to encode the message "AABADecimal

The decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....

, an initial range of

Since our first symbol is an A, it reduces our initial range down to

With two symbols encoded, our range is now

This time it is the second of our three choices that represent the message we want to encode, and our range becomes

Finally, with our range narrowed down to

And since

Decimal

The decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....

rather than base 2

Binary numeral system

The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

.)

The central problem may appear to be selecting an initial range large enough that no matter how many symbols we have to encode, we will always have a current range large enough to divide into non-zero sub-ranges. In practice, however, this is not a problem, because instead of starting with a very large range and gradually narrowing it down, the encoder works with a smaller range of numbers at any given time. After some number of digits have been encoded, the leftmost digits will not change. In the example after encoding just three symbols, we already knew that our final result would start with "2". More digits are shifted in on the right as digits on the left are sent off. This is illustrated in the following code: