Incompressible string
Encyclopedia
An incompressible string
is one that cannot be compressed
because it lacks sufficient repeating sequences
. Whether a string is compressible will often depend on the algorithm
being used. Some strings are incompressible by any algorithm — see Kolmogorov complexity
.
method that works by putting a special character into the string (say '@') followed by a value that points to an entry in a lookup table
(or dictionary) of repeating values. Let's imagine we have an algorithm that examines the string in 4 character chunks. Looking at our string, our algorithm might pick out the values 1234 and 9999 to place into its dictionary. Let's say 1234 is entry 0 and 9999 is entry 1. Now the string can become:
@0@1@0@1@0
Obviously, this is much shorter, although storing the dictionary itself will cost some space. However, the more repeats there are in the string, the better the compression will be.
Our algorithm can do better though, if it can view the string in chunks larger than 4 characters. Then it can put 12349999 and 1234 into the dictionary, giving us:
@0@0@1
Even shorter. Now let's consider another string:
1234999988884321
This string is incompressible by our algorithm. The only repeats that occur are 88 and 99. If we were to store 88 and 99 in our dictionary, we would produce:
1234@1@1@0@04321
Unfortunately this is just as long as the original string, because our placeholders for items in the dictionary are 2 bytes long, and the items they replace are the same length. Hence, this string is incompressible by our algorithm.
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
is one that cannot be compressed
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
because it lacks sufficient repeating sequences
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...
. Whether a string is compressible will often depend on the algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
being used. Some strings are incompressible by any algorithm — see Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
.
Example
Suppose we have the string 12349999123499991234, and we are using a compressionData compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
method that works by putting a special character into the string (say '@') followed by a value that points to an entry in a lookup table
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...
(or dictionary) of repeating values. Let's imagine we have an algorithm that examines the string in 4 character chunks. Looking at our string, our algorithm might pick out the values 1234 and 9999 to place into its dictionary. Let's say 1234 is entry 0 and 9999 is entry 1. Now the string can become:
@0@1@0@1@0
Obviously, this is much shorter, although storing the dictionary itself will cost some space. However, the more repeats there are in the string, the better the compression will be.
Our algorithm can do better though, if it can view the string in chunks larger than 4 characters. Then it can put 12349999 and 1234 into the dictionary, giving us:
@0@0@1
Even shorter. Now let's consider another string:
1234999988884321
This string is incompressible by our algorithm. The only repeats that occur are 88 and 99. If we were to store 88 and 99 in our dictionary, we would produce:
1234@1@1@0@04321
Unfortunately this is just as long as the original string, because our placeholders for items in the dictionary are 2 bytes long, and the items they replace are the same length. Hence, this string is incompressible by our algorithm.