Move-to-front transform
Encyclopedia
The move-to-front transform (or MTF) is an encoding
of data
(typically a stream of byte
s) designed to improve the performance of entropy encoding
techniques of compression
. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression algorithm
s.
This algorithm was published in the following paper: Ryabko, B. Ya. Data compression by means of a "book stack”, Problems of Information Transmission, 1980, v. 16: (4), pp. 265–269. The original name of this code is “book stack”. The history of discovery of the book stack (or move-to-front) code can be found here: Ryabko, B. Ya.; Horspool, R. Nigel; Cormack, Gordon V. Comments to: "A locally adaptive data compression scheme" by J. L. Bentley, D. D. Sleator, R. E. Tarjan and V. K. Wei. Comm. ACM 30 (1987), no. 9, 792—794.
Let us give a precise description. Assume for simplicity that the symbols in the data are byte
s.
Each byte value is encoded by its index in a list of bytes, which changes over the course of the algorithm. The list is initially in order by byte value (0, 1, 2, 3, ..., 255). Therefore, the first byte is always encoded by its own value. However, after encoding a byte, that value is moved to the front of the list before continuing to the next byte.
An example will shed some light on how the transform works. Imagine instead of bytes, we are encoding values in a-z. We wish to transform the following sequence:
bananaaa
By convention, the list is initially (abcdefghijklmnopqrstuvwxyz). The first letter in the sequence is b, which appears at index 1 (the list is indexed from 0 to 25). We put a 1 to the output stream:
1
The b moves to the front of the list, producing (bacdefghijklmnopqrstuvwxyz). The next letter is a, which now appears at index 1. So we add a 1 to the output stream. We have:
1,1
and we move back the letter a to the top of the list. Continuing this way, we find that the sequence is encoded by:
1,1,13,1,1,1,0,0
It is easy to see that the transform is reversible. Simply maintain the same list and decode by replacing each index in the encoded stream with the letter at that index in the list. Note the difference between this and the encoding method: The index in the list is used directly instead of looking up each value for its index.
i.e. you start again with (abcdefghijklmnopqrstuvwxyz). You take the "1" of the encoded block and look it up in the list, which results in "b". Then move the "b" to front which results in (bacdef...). Then take the next "1", look it up in the list, this results in "a", move the "a" to front ... etc.
, so using an array to store the list is acceptable, with worst case performance O
(nk), where n is the length of the data to be encoded and k is the number of values (generally a constant for a given implementation).
However, for decoding, we can use specialized data structures to greatly improve performance.
However, not all data exhibits this type of local correlation, and for some messages, the MTF transform may actually increase the entropy.
An important use of the MTF transform is in Burrows–Wheeler transform based compression. The Burrows-Wheeler transform is very good at producing a sequence that exhibits local frequency correlation from text
and certain other special classes of data. Compression benefits greatly from following up the Burrows-Wheeler transform with an MTF transform before the final entropy-encoding step.
As an example, imagine we wish to compress Hamlet's soliloquy
(To be, or not to be...). We can calculate the entropy of this message to be 7033 bits. Naively, we might try to apply the MTF transform directly. The result is a message with 7807 bits of entropy (higher than the original). The reason is that English text does not in general exhibit a high level of local frequency correlation. However, if we first apply the Burrows-Wheeler transform, and then the MTF transform, we get a message with 6187 bits of entropy. Note that the Burrows-Wheeler transform does not decrease the entropy of the message; it only reorders the bytes in a way that makes the MTF transform more effective.
Code
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....
of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
(typically a stream of byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
s) designed to improve the performance of entropy encoding
Entropy encoding
In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....
techniques of compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s.
The transform
The main idea is that each symbol in the data is replaced by its index in the stack of “recently used symbols”. For example, long sequences of identical symbols are replaced by as many zeroes, whereas when a symbol that has not been used in a long time appears, it is replaced with a large number. Thus at the end the data is transformed into a sequence of integers; if the data exhibits a lot of local correlations, then these integers tend to be small.This algorithm was published in the following paper: Ryabko, B. Ya. Data compression by means of a "book stack”, Problems of Information Transmission, 1980, v. 16: (4), pp. 265–269. The original name of this code is “book stack”. The history of discovery of the book stack (or move-to-front) code can be found here: Ryabko, B. Ya.; Horspool, R. Nigel; Cormack, Gordon V. Comments to: "A locally adaptive data compression scheme" by J. L. Bentley, D. D. Sleator, R. E. Tarjan and V. K. Wei. Comm. ACM 30 (1987), no. 9, 792—794.
Let us give a precise description. Assume for simplicity that the symbols in the data are byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
s.
Each byte value is encoded by its index in a list of bytes, which changes over the course of the algorithm. The list is initially in order by byte value (0, 1, 2, 3, ..., 255). Therefore, the first byte is always encoded by its own value. However, after encoding a byte, that value is moved to the front of the list before continuing to the next byte.
An example will shed some light on how the transform works. Imagine instead of bytes, we are encoding values in a-z. We wish to transform the following sequence:
bananaaa
By convention, the list is initially (abcdefghijklmnopqrstuvwxyz). The first letter in the sequence is b, which appears at index 1 (the list is indexed from 0 to 25). We put a 1 to the output stream:
1
The b moves to the front of the list, producing (bacdefghijklmnopqrstuvwxyz). The next letter is a, which now appears at index 1. So we add a 1 to the output stream. We have:
1,1
and we move back the letter a to the top of the list. Continuing this way, we find that the sequence is encoded by:
1,1,13,1,1,1,0,0
Iteration | Sequence | List |
---|---|---|
bananaaa | 1 | (abcdefghijklmnopqrstuvwxyz) |
bananaaa | 1,1 | (bacdefghijklmnopqrstuvwxyz) |
bananaaa | 1,1,13 | (abcdefghijklmnopqrstuvwxyz) |
bananaaa | 1,1,13,1 | (nabcdefghijklmopqrstuvwxyz) |
bananaaa | 1,1,13,1,1 | (anbcdefghijklmopqrstuvwxyz) |
bananaaa | 1,1,13,1,1,1 | (nabcdefghijklmopqrstuvwxyz) |
bananaaa | 1,1,13,1,1,1,0 | (anbcdefghijklmopqrstuvwxyz) |
bananaaa | 1,1,13,1,1,1,0,0 | (anbcdefghijklmopqrstuvwxyz) |
Final | 1,1,13,1,1,1,0,0 | (anbcdefghijklmopqrstuvwxyz) |
It is easy to see that the transform is reversible. Simply maintain the same list and decode by replacing each index in the encoded stream with the letter at that index in the list. Note the difference between this and the encoding method: The index in the list is used directly instead of looking up each value for its index.
i.e. you start again with (abcdefghijklmnopqrstuvwxyz). You take the "1" of the encoded block and look it up in the list, which results in "b". Then move the "b" to front which results in (bacdef...). Then take the next "1", look it up in the list, this results in "a", move the "a" to front ... etc.
Implementation
Details of implementation are important for performance, particularly for decoding. For encoding, no clear advantage is gained by using a linked listLinked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
, so using an array to store the list is acceptable, with worst case performance O
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
(nk), where n is the length of the data to be encoded and k is the number of values (generally a constant for a given implementation).
However, for decoding, we can use specialized data structures to greatly improve performance.
Use in practical data compression algorithms
The MTF transform takes advantage of local correlation of frequencies to reduce the entropy of a message. Indeed, recently used letters stay towards the front of the list; if use of letters exhibits local correlations, this will result in a large number of small numbers such as "0"'s and "1"'s in the output.However, not all data exhibits this type of local correlation, and for some messages, the MTF transform may actually increase the entropy.
An important use of the MTF transform is in Burrows–Wheeler transform based compression. The Burrows-Wheeler transform is very good at producing a sequence that exhibits local frequency correlation from text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
and certain other special classes of data. Compression benefits greatly from following up the Burrows-Wheeler transform with an MTF transform before the final entropy-encoding step.
As an example, imagine we wish to compress Hamlet's soliloquy
To be, or not to be
"To be, or not to be" is the opening phrase of a soliloquy from William Shakespeare's play Hamlet , Act III, Scene 1. It is the best-known quotation from the play and probably the most famous in world literature but there is disagreement on its meaning, as there is of the whole speech.- Text :This...
(To be, or not to be...). We can calculate the entropy of this message to be 7033 bits. Naively, we might try to apply the MTF transform directly. The result is a message with 7807 bits of entropy (higher than the original). The reason is that English text does not in general exhibit a high level of local frequency correlation. However, if we first apply the Burrows-Wheeler transform, and then the MTF transform, we get a message with 6187 bits of entropy. Note that the Burrows-Wheeler transform does not decrease the entropy of the message; it only reorders the bytes in a way that makes the MTF transform more effective.